This is a short policy brief exploring the the problems posed by Big Data companies and potential solutions.

I) Executive Summary

Big Data companies thrive on the unregulated control and processing of massive amounts of information. This has resulted in several problems, political, economic, and psychological. Politically, Big Data companies have accumulated disproportionate influence within federal government and threaten to overturn local democracy where they operate. Economically, Big Data companies have become monopolies or duopolies, hindering competition, and psychologically, the products of Big Data companies may lead to adverse mental health effects. A final problem is the lack of consumer protection. Due to the recentness of such technologies and corporate growth, there are few options to draw on. Data protection laws have been emerging in Europe and there has been limited American interest in pursuing regulation of Big Data firms. The recommended options are to expand the literature and investigations of the companies and data protection, so that larger efforts in the future will be more enabled.

I)a) A working definition of “big data”

The term big data refers to data sets so large that they are handled computationally to reveal patterns, trends, and so on. As the years have passed, big data has increasingly been accumulated by common consumer devices: a good example would be a person’s phone and social media applications collecting information for processing by the companies that administer those products. Technically speaking, big data can include things far from the popular understanding–Generally speaking, “big data” refers to the process of information processing, or that information itself. For the purposes of this brief, I use “Big Data” a la “Big Tobacco” or “Big Oil”: to refer to the major technology companies that process “big data” as a large part of their business models. Such companies include the major social media networks—Facebook, Twitter, Instagram, etc—and (FINISH). In this report, while “Big Data” will refer to companies, “big data” will refer to the literal vast quantities of information.

II) The four dimensional problem of big data

One must wonder, and reasonably so, at the necessity of examining a problem at such a large level. It is necessary nonetheless, as examining big data from this level allows for an understanding of that which connects many issues. The essential problem is a new and rapidly developing technology has had very little regulation or oversight relative to its influence. Huge amounts of people regularly use services powered by big data technology, and as automation increases in many sectors of society, such technology increasingly becomes a staple technology.

This brief will identify four main problems in big data, each one themselves broad. This is an ambitious endeavor, but an endeavor that needs to be undertaken. Tackling these issues separately fails to address the root, and will do little to prepare us for the new problems that grow in their place. As such, I will examine four problems that all owe their roots directly to the unregulated processing of massive amounts of information that is big data. The first three can be more neatly characterized problems in the realms of politics, economics, and psychological well-being, though overlap naturally exists. The final problem is that of data vulnerability, which entails an individual’s vulnerability as both a citizen and as a consumer.

II)a) Corporate Influence on Government

The difficulty of this dimension of the problem is it can be extended quite easily to companies outside of Big Data. This dimension of the problem is enabled because of a larger set of circumstances not necessarily related to big data processing or technology-specific companies. Nonetheless, efforts aimed at taming unregulated big data processing may conceivably reduce undue corporate power.

The influence of major tech companies in government is not to be overstated. Popular imaginings of corporate influence on government tend towards imagery of bankers or those from the tobacco industry. From 2007 to 2017, however, the five largest tech firms outspent Wall Street in federal lobbying by a factor of 2:1.[1] Google’s parent company, Alphabet, spent more on lobbying than any other corporation in 2017.[2] The major tech companies have reached new records in money spent on lobbying: Facebook, Amazon, Alphabet, and Apple all broke company records, with Facebook increasing spending by 32% in 2017 and Apple increasing spending 51% in 2017.[3] Big Data has not just lobbied, but has ensured a clear exchange of people between government and company positions—the “glass door” Google has built for itself includes 183 people who formerly worked for the federal government during the Obama administration and 58 ex-employees who have taken government jobs.[4]

Of course, the aforementioned facts are taking place at the national level. Amazon’s search for a second headquarters offers an interesting case study in the degree to which a tech titan overpowers public, and particularly local, governance. The proposals sent by many cities in an effort to entice Amazon indicate a direction of private dominance of local government. For example, Chicago’s proposal to Amazon would have allowed it to collect $1.32 billion in income taxes paid by its workers—in other words, the Amazon workers at Chicago would be effectively paying their taxes to their bosses.[5] Fresno offered to put 85% of the taxes generated by Amazon into a fund that would be administrated by board half-Amazon and half-city: in other words, to allow Amazon direct control over the way tax money is spent.[6] To assess the situation as it has been in this document is not an overstatement: these are clear points of local governments so desperate for a source of jobs they will sacrifice local democracy.

To current knowledge, all aforementioned practices have not been unlawful. The issue at hand is not whether such actions are legal or not—that is a matter for the national debate on lobbying regulations and campaign finance reform—but that inherent to the business models of Big Data is an enormous concentration of wealth and corporate power that has allowed for unprecedented levels of lobbying in an era already characterized by unprecedented levels of lobbying. Such lobbying has allowed for an enormous amount of leverage to be given to the hands of a few people—a highly disproportionate situation, to say the least. Subsequent solutions, insofar as they better regulate Big Data companies, may also place some limitations on the concentration of corporate power.

II)b) Monopolies and the concentration of corporate economic power

A recurring economic debate in recent years has been that of the size of the major technology firms, and whether they ought to be regulated. Historical precedents in antitrust law are used by both sides of the debate, but the effect is minimized as the business model of Big Data is in some ways unprecedented. Major arguments against the monopoly argument tend towards the consumer’s reality: the consumer is not significantly hurt by the current situation. Many of the products offered by Big Data are free, and innovations abound despite industry dominance. Moreover, there is generally no reduction in choice on the part of the consumer—as is often said, anyone can stop using Google with a couple clicks. Counterarguments include that such a structure exists now, but innovation was also common to Standard Oil, as well as low prices to the consumer.[7] Moreover, although a person technically can use a different search engine, it is highly unlikely they will as no other search engine can provide such robust results. A glance at industry dominance, however, certainly ranks Big Data today with the monopolies of old.

Google holds 89% of internet search[8] and 88% of search advertising.[9] 95% of young adults use a Facebook product,[10] and Facebook owns 77% of mobile social traffic.[11] 75% of online book sales take place through Amazon.[12] Where Big Tech companies are not literal monopolies, they are duopolies or oligopolies. 99% of mobile phone operating systems were provided by Google and Apple, and 95% of desktop operating systems were provided by Microsoft and Apple.[13] Google and Facebook took 63% of online advertisement spending. Such percentages were defining characteristics of the great monopolies of history: Standard Oil in 1904 held an 87% market share, and General Electric in 1896 had 75%.[14]

The percentages are attested to by the rapid growth in company sizes: within ten years, tech companies have become the largest companies by market capitalization. In 2006, the largest companies by market capitalization (in order) were Exxon Mobil, General Electric, Microsoft, Citigroup, and Bank of America.[15] In 2017, the largest companies were Apple, Alphabet, Microsoft, Amazon, and Facebook.[16] Moreover, the largest company in 2006 (Exxon Mobil) had a market capitalization of $540 billion, and the fifth largest was Bank of America at $290 billion.[17] In 2017, the fifth largest company by market capitalization was Facebook at $414 billion and the largest was Apple at $794 billion.[18] At the time of writing, May 2018, Apple’s market capitalization is $915 billion—putting it on track to be the first company worth $1 trillion by market capitalization.[19] It is quite possible that if Bank of America and Citigroup were “too big to fail” in the Great Recession, Big Data has far surpassed the size marker.

As has been reiterated, the consumer side of things remains the main barrier to political feasibility. However, limited bureaucratic efforts and statements from high profile politicians may indicate room for a spot on the national agenda.

II)c) Psychological Effects

Growing evidence suggests that social media services, which are powered by big data processing, may cause mental health issues, particularly with depression and attention deficiency. At the present, it would seem there is a quite notable correlation between social media and depression[20][21] or neuroticism[22] and the bigger question has become whether or not there is a causal relationship. Additional incoming literature (from around the world) has found social media to be addictive, though such research is also currently unfolding.[23] The relationship between social media and the mental health of young people, who may often be some of the heaviest users of social media, is also a popular topic of investigation—growing evidence indeed seems to indicate younger people who use social media may be particularly at risk for mood disorders and attention deficit disorders.[24]

That there is not resounding evidence across the board that social media is harmful for mental health does not detract from the problem at hand. Big Data is new, and the effects of Big Data’s products are correspondingly not yet fully understood. Big data powered services may be analogous to smoking, with the effects not fully known or suppressed. However, as even emerging research shows Facebook and related sites could cause depression, reacting with caution to such technology is not inappropriate.

II)d) Data Vulnerability

This final dimension of the problem takes the individual as both a consumer and as a citizen, both in danger due to lack of privacy. As a citizen, the individual faces a vast reduction in privacy rights (which, by extension, makes them more vulnerable to abuse by the state) and as a consumer the individual is at a much higher risk of identify-theft and hacking.  As changing national security and intelligence agency practices require extremely high levels of political effort, this report will aim at the consumer side of data safety.

As culture changes with technology, the fear many have (at least as Americans) of loss of privacy to government surveillance programs may seem to be diminishing. However, there is a quite valid reason for any person to be concerned with their online privacy regardless of their stance on surveillance. The Equifax hacking provides an excellent example of how unprotected consumer data is. The latest figures of the security breach are as follows—personal information on 145 million consumers, which includes names, Social Security numbers, dates of birth, and addresses—was dumped onto the web, leaving it vulnerable to thieves.[25] The breach of 145 million citizens’ information is catastrophic, and happened under one of the largest credit reporting agencies in the world. It is possible Equifax is an outlier among major companies in security protocols, but as there are almost no federal regulations mandating certain security measures for data collection, it is more likely other major corporations have similarly loose protocols.

III) What Others Have Done and Theoretical Approaches

In an ideal world, any robust policy brief would provide previous examples of solutions independent of theoretical solutions. Unfortunately, the technology is so new and the problems so diverse that stepping into potential solutions requires a mixture of both.

III)a) Antitrust law

III)a)1) US vs. Microsoft

United States vs. Microsoft was a major antitrust ruling of recent history, and against a big technology company—although at the time it was not quite a data company the way it is now. The ruling required Microsoft to separate its mandated integration of operating system and web browser software. The ruling took place amidst clear harm to the consumer (with forced prices on products that did not face competition) and during a time in which Microsoft was less powerful than it is today. As Microsoft has moved more towards a business model that prioritizes big data processing, the relevancy of such a ruling is limited—simply worth mentioning for the precedent of antitrust legislation dealing with software and technology.

III)a)2) Federal Trade Commission Reports on Google

In 2013 the Federal Trade Commission published the results of its investigation of Google’s business tactics.[26] The report concluded Google’s practices did not constitute anti-competitive practices and did not warrant legal action.[27] However, a different report from 2012 concluded Google used anti-competitive tactics and abused monopoly power.[28] The report recommended the FTC sue Google, an action that, if undertaken, would have been the highest profile antitrust case since U.S. vs. Microsoft.[29] The report’s conclusion also noted Google’s practices resulted in harm to consumers as well as innovation, a notable comment in the face of popular anti-regulation arguments.[30] Such a response did not come with significant action; however, it indicates an understanding and some level of motive at the highest levels of bureaucracy relevant to economic regulation that there is some case for antitrust law against Google. It also helps to establish legal documentation that could be useful in court cases and lawmaking in the future. The FTC did manage to fine Google a record $22.5 million for violating the privacy promise it made to consumers, but it is yet to be seen whether such an action had any true impact.[31]

III)b) Data Ownership and Protection

This approach is partially theoretical, and partially exists. Data ownership generally refers to the concept that a person owns or has some legal entitlement to their online data. Within the United States and most other countries, there are few to no examples. However, several European countries, and the European Union as a whole, have taken legal steps in this direction. Aside from ownership or rights to control of data, the EU’s recent General Data Privacy Regulation marks a significant departure from American privacy standards.

III)b)1)Data Ownership in Europe

In Germany, there are no hard laws guaranteeing a right to one’s data, but there have been court rulings and legal findings amounting to some individual control over their data (legally, at least). For example: “the right of personality” (which protects a person’s right to develop their own personality) has been extended to one’s data.[32] The Spanish legal consensus goes farther: the owner of a data-generating device legally own their data as property, as opposed to the creator the device.[33] However, this has yet to be tested in court.[34] The Spanish legal opinion on this has been echoed in Germany as a legal possibility, but has not been applied.[35] Although actual practice of data as a form of personal property does not have widespread support or legal acceptance currently, limited forms of data-rights or limited legal ownership exist in Europe and seem to be increasing, not decreasing.[36]

III)b)2)Data Protection in Europe: the General Data Privacy Regulation

Though it does not confer ownership of data (i.e., data as a form of property) to individuals, the recent General Data Privacy Regulation offers significantly more legal control to the individual over their data than has been seen elsewhere in the world. The GDPR, to take effect May 25th, 2018, rests on the fundamental assumption that one has a right to privacy and their information.[37] The GDPR confers the following rights to the individual: the right of being informed about one’s data; the right of access to one’s data; the right to rectifying one’s data; the right of restricting the processing of such data; the right to data portability; the right to object; and additional rights about automated decision-making.[38] The GDPR confers specifically the “right to be forgotten”—an individual can request for the deletion of their data, and the data-collecting entity must delete the data in its entirety (anywhere it exists in its storage infrastructure).[39] As the law will only be going into effect a few days after the time of this writing, it is not known yet how this will affect Big Data. Nonetheless, it is a historical first.

Extending such laws as these to the United States obviously holds many obstacles, most of them in political feasibility. Nonetheless, the European Union is the world’s foremost economic power, alongside the United States, and the decision will be sure to reverberate around the data debates.

III)c) The Campaign Against Big Tobacco

A grassroots approach to tackling the psychological effects of Big Data services has potential. The campaign against Big Tobacco in the 20th century has been a remarkable case of public mobilization against private interests. Applying it to Big Data is mostly theoretical, however; literature to this date is yet strong enough and most continue to develop before a well-informed public campaign can have legal effectiveness. However, it is possible boycotting would effectively force Big Data companies to make some adjustments to their services. Social media services are easier to boycott because they are nonessential items and they are free, though search engine use could be more difficult. Political feasibility may open up if the debate is centered on the effects of social media on children, as their ability to consent is more in question than that of an adult’s. Moreover, the brains of children and adolescents are still developing, and the effects of new technologies on such development is not yet understood. The aesthetic of centering a campaign around children’s health, particularly when there is no cost to boycotting, has potential to be the most effective option here. However, it is largely theoretical, with tremendous amounts of conjecture, and for that reason investigating this approach must be taken cautiously.

IV) Policy Recommendations

Making policy recommendations on this topic is naturally difficult. The problem is multifaceted, and there is a limited amount of examples to draw from—perhaps none in the modern era relevant to the unique situation produced by big data technology. Nonetheless, the root problem stands and it would seem an accompanying policy recommendation should be similarly multifaceted.

Regulation of Big Data, particularly through antitrust law, seems to offer the most potential impact without being completely politically unfeasible—although it is not the most politically feasible endeavor either. As such, the recommended option is to emphasize additional Federal Trade Commission reports and federal investigations. It is possible new, smaller legal challenges could produce precedent for larger antitrust rulings later down the time. Such research must focus on data, as it cements corporate monopoly power. The same options are recommended for data privacy and consumer-protection efforts. Obstacles are high, but by tackling the problem from the angle of monopoly power and from the angle of data rights, there could be ground made for larger fights in the future, when there are better opportunities for larger legal battles. For now, the recommendations are to focus on expanding literature, reporting, and investigation to produce the framework necessary for larger incursions. Grassroots efforts may offer political feasibility for restricting social media use beyond certain age ranges, but such a move is contentious and academic literature is not yet complete enough to justify such actions.

