The Cruz Campaign has paid one firm over $4.9 million for data analytics. Here's why that's important

Author's Note: This is a very rough outline of an ongoing research project I am investigating. Comments are welcome.

The Problem

The era of data analytics and computing has been heralded as a new age of knowledge and all purpose tool to solve currently and future issues. While industry and online commerce trace the clicks and purchases of online users to better market and sell products, political campaigns are finding data tools critical in winning close elections. Using data and predictive voter models, campaigns are able to identify factors from a voter’s past political and purchasing history to communicate a specific, tailored message (Kreiss 2012). These sophisticated messages are not only used for voter outreach, but also get out the vote (GOTV) mobilization and fundraising. In the 2016 Republican caucus, Senator Ted Cruz of Texas reportedly outperformed polling and won due to a ground game aided by sophisticated internal data (Hamburger 2016). Similarly, many other campaigns have utilized sophisticated data sets and predictive voter models, prominently including the Obama for President campaigns in 2008 and 2012 (Kreiss 2012). In addition, a private cottage industry of firms has emerged to offer their services to campaigns willing to pay.

However, the rise of sophisticated data analytics presents three major concerns of note. First, there are significant privacy concerns of marrying commercial and political data to each other and making predictive based on that data. Data is not always perfect, and the misapplication of it may leave voters (or entire communities) outside of the political process. Second, even with the accurate application of data, there are troubling implications for democracy. Instead of campaigns speaking to the broadest political audiences, data allows for campaigns to find the voters most sympathetic and persuadable to their message. In addition, campaigns with access to the most financial and technical knowledge resources are better positioned to utilize data analytics to their advantage. Finally, and my own primacy concern, is the redlining potential of political micro-targeting on minority communities. When political micro-targeting is based on past voting history, historically disenfranchised communities of color are less likely to be targeted by future campaigns. Often, non-voters do not engage in the political system because they have never been brought into the system (Need Citation). When campaigns are faced with the difficult choice, bound by limited resources, to either reach out to new potential voters or mobilize likely supporters – they will usually choose to make sure their likely voters get out and vote.

The Question

This paper is intended as an initial investigation into a much larger question. A longer term, large scale investigation into how political micro-targeting practices are being used to speak to (or not speak to) minority communities is outside the scope of this paper. Similarly, attempting to answer what kind of perceptions and values of U.S. Latinos are being coded into the systems and applications presents methodological issues also outside the scope of possibility. As a result, this paper focuses on a narrower, but important, initial question; what are the data collection practices of the major 2016 presidential campaigns? A second related sub-question is, what are the political and economic relations between campaigns and private data collection firms?

Policy Implications

There are significant policy issues at play on the issue of data and political micro targeting. The privacy concerns become more troubling when political data is combined with health and financial data. Limiting political interaction to sympathetic voters also presents issues around a deliberative democracy. Finally, there are issues of equity when it comes to the choice to ignore minority voters.

Specifically, for this paper, political campaigns are utilizing voter data combined with data from other sources, including commercial databases along with user provided and extracted data. Campaigns can extract data not just by user provided information such as age, gender and location but also through browser enabled technologies. For example, campaign websites typically will track which website a user came from (google, facebook, a blog) to review their advertising strategies.

As such, these current practices present serious questions about privacy such as, should campaigns be allowed to use commercial data to reach out to voters? Many campaigns have privacy and user agreements, however many are out of sight and hard to find. In addition, opting out of data collection equates to opting out of use. Thus another significant policy question emerges, should campaigns be required to provide and “opt out” choice to site visitors? Finally, these political leaders often have divergent opinions regarding surveillance policy at massive scale, such as the data monitoring done by the National Security Agency. Should we doubt potential leader’s commitment to privacy when their campaigns engage is similar invasive techniques?

Literature Review

Daniel Kreiss in his 2012 article, “Yes We Can (Profile You)”, argues “gathering and acting on data about the electorate has a long history, but the sheer expanse of data now gathered and stored about the electorate and the modeling and targeted communications it supports are qualitatively new.” Campaigns have long used polls and other technologies of political knowledge gathering to develop a comparative advantage over other campaigns. Polls and in general the goal of quantifying political participation and opinion has long been a feature of American politics. In Numbered Voices: How Opinion Polling has Shaped American Politics, Susan Herbst argues the quantifying nature of polls is expressly linked to maintain a rational democracy. In addition, she argues polls have been used in America to command authority and control political history. Jean Converse in Survey Research in the United States: Roots and Emergence 1890-1960 argues the first modern polls, which emerged in 1930’s, had three historical legacies: the study of social conditions in Britain, the study of attitude by psychologists and marketing research. Technologies and artifacts always have politics embedded within them, even if they remain unseen, (Winner 1986) and in the case of polling technologies – the elevating of quantification and claims to objectivity have political ramifications.

A critical view of knowledge gathering situates political campaigns as an apparatus of the state’s desire for more information about its subjects. Even in a liberal democracy, knowledge about citizens is critical to governance – whether for beneficial causes such as public health, (arguably) benign causes such as the census or malignant causes such as segregation. Increasingly, political polls and representations of public opinion have shaped American politics increasingly every cycle, with cable news channels filling more airtime with speculation and punditry based on polls. In a Foucauldian sense, the goal of turning people into easily categorizable subjects through knowledge production is a function of power. In Discipline and Punishment, Foucault says, “There is no power relation without the correlative constitution of a field of knowledge, nor any knowledge that does not presuppose and constitute at the same time power relations.”

In sum, the literature review of this paper will focus on the recent and long history of data collection, the power relations inherent in data collection and application and a brief review of the technical mechanisms of modern data collection and voter micro-targeting in campaigns.

Methodology

Data analytics may invoke images of number crunching, algorithmic calculations and other heavy quantitative aspects. However, this numbers heavy world may actually call for a more in depth, qualitative analysis of what is going on. Qualitative analysis as a methodological paradigm is focused on the meaning making behind social and communicative phenomena.

This investigation is looking at the privacy policies of the major 2016 presidential campaigns[1] as the unit of analysis. Supplemental contextual information will be drawn from publically available interviews, websites and reporting.

The method used to analyze privacy policies will be a summative content analysis, a type of qualitative content analysis (Hsieh, Shannon 2005). Hsieh and Shannon define summative content analysis as a method that, “involves counting and comparisons, usually of keywords or content, followed by the interpretation of the underlying context” (1). While qualitative content analysis does involve counting and quantification, it ultimately focuses on the emergent meaning and themes that come from the text.

David Karpf (2012) notes the difficulty in studying the internet, as “the internet of 2002 has important differences from the internet of 2005, or 2009 or 2012.” Karpf continues arguing, “the medium is simultaneously undergoing a social diffusion process and an ongoing series of code-based modifications. Social diffusion brings in new actors with diverse interests. Code-based modifications alter the technological affordances of the media environment itself.” Indeed, the constant evolving nature of the internet present a methodological challenge. Kriess later in his 2015 article notes “it is exceptionally difficult to generalize findings about digital campaigning from one-time period to another.” And the further adoption of unique user data mining and application also may present different facing internet content that will be increasing difficult to document.

Results

On the evening of February 1st, following one of the longest electoral prologues in American history, Ted Cruz was declared the victor of the 2016 Republican Iowa caucus. While only securing slightly over a quarter of the vote, amongst a field of 12 candidates it was enough to gain a plurality. Cruz’s victory also came as a bit of a shock to some as polls indicated Donald Trump was poised to win the first in the nation caucus. It seems superior data analytics and resulting superior field organization helped Cruz win the day.

In late 2015, the Washington Post reported that Cruz campaign has organized a,

“team of statisticians and behavioral psychologists who subscribe to the burgeoning practice of “psychographic targeting” built their own version of a Myers-Briggs personality test. The test data is supplemented by recent issue surveys, and together they are used to categorize supporters, who then receive specially tailored messages, phone calls and visits. Micro-targeting of voters has been around for well over a decade, but the Cruz operation has deepened the intensity of the effort and the use of psychological data.”

Cruz who has been critical in the past of the widespread data monitoring and tracking of American citizens by the National Security Agency, has a campaign apparatus which collects much more personal data about its supporters and potential supporters.

For example, the Washington Post report notes the Cruz campaign was able to identify at least 90 voters whose primary political concern was the statewide ban on fireworks – yes, fireworks. In addition, using psychological data, the Cruz campaign was also able to broadly identify two thematic objections to the ban, “fun-loving” and “libertarian.” Fun-lovers objected to the ban because they viewed fireworks as an essential component of family and community celebrations such at the 4th of July. Liberations philosophically objected to the government banning their sale. Using this data, the Cruz campaign was able to target unique messages to both groups.

The Cruz campaign website has a privacy policy which details their data collection policies. For example, the website collects personal information such as, “heir name, address, telephone number, cell phone number, email address, voter registration history, or credit card number as well as information about your activities on this site when it is linked with other information that would enable a reader to identify you.” In addition, the campaign also automatically collects data beneath the content layer such as,

“your mobile device’s unique ID number, your mobile device’s geographic location while the app is actively running, your computer’s IP address, technical information about your computer or mobile device (such as type of device, web browser or operating system), your preferences and settings (time zone, language, privacy preferences, product preferences, etc.), the URL of the last web page you visited before coming to one of our sites, the buttons, controls and ads you clicked on (if any), how long you used our website or app and which services and features you used, and the online or offline status of Cruz Crew.”

The campaign also notes they reserve the right to share a voter’s data with service providers, ideologically or politically aligned organizations and analytics companies.

The Cruz campaigns’ FEC filings emphasize their heavy reliance on digital data collection, analytics, and communication. Cambridge Analytica, the firm mentioned in the Washington Post article, specializes in using,

“data modeling and psychographic profiling to grow audiences, identify key influencers, and connect with people in ways that move them to action. Our unique data sets and unparalleled modeling techniques help organizations across America build better relationships with their target audience across all media platforms”

As of the most recent FEC filing date, the Cruz campaign has paid Cambridge Analytics over $4.9 million since October of 2015. In addition, the Cruz campaign has also paid over a $1 million for a voter list from Targeted Victory, another data firm. Targeted Victory advertises themselves as “audience specific, screen agnostic.” Overall expenditures to Targeted Victory total more than $2.5 million.

These of course are preliminary results and the other campaigns have yet to be looked at. For example, the Donald Trump campaigns claims not to collect personally identifiable information, but then details all the ways personal data is collected by third parties.

However, a clear preliminary result is the cost of data collection is incredibly high. Data is becoming extremely valuable, mirroring a digital currency that most users don’t even recognize as currency. If physical currency such as coins and banknotes has implied commoditized value, then data currency has implicit commoditized information value. As a result, the rise of data microtargeting is explicitly linked to the tremendous amount of money raised and expended by political campaigns.

Working Bibliography

Hamburger, T. (2015, December 13). Cruz campaign credits psychological data and analytics for its rising success. The Washington Post.

Karpf, D. (2010). Online Political Mobilization from the Advocacy Group's Perspective: Looking Beyond Clicktivism. Policy & Internet, 2(4), 7-41. doi:10.2202/1944-2866.1098

Kreiss, D. (2012). Yes We Can (Profile You) A Brief Primer on Campaigns and Political Data. The Stanford Law Review, 64, 70-74.

Kreiss, D. (2015). Digital campaigning. In S. Coleman & D. Freelon (Eds.), Handbook of digital politics (pp. 118-135). Cheltenham, UK: Edward Elgar.

[1] For the purposes of this paper, defined as competing in at least one primary or caucus. A campaign that dropped out prior to voting would not be considered.