Why (almost) everything reported about the Cambridge Analytica Facebook ‘hacking’ controversy is wrong

Reposted from Medium.

If you follow the Guardian or the New York Times, or any major news network, you are likely to have noticed that a company called Cambridge Analytica have been in the headlines a lot.

The basic story as reported is as follows:

A shady UK data analytics company, with the help of a 24 year old tech genius developed an innovative technique to ‘hack’ facebook and steal 50 million user profiles. Then they used this data to help the Trump and Brexit campaigns psychologically manipulate voters through targeted ads. The result was Vote Leave ‘won’ the UK’s Brexit referendum and Trump was elected president in the US.

Unfortunately, almost everything in the above summary is false or misleading.

First, There was no hack.

The data collected was scraped from Facebook user profiles, after users granted permission for a third party app to access their data. You know those little confirmation windows that pop up when someone wants to play Candy Crush or use Facebook to log in, rather than make a new password, for a random site? Yeah those.

 

A Cambridge academic called Aleksandr Kogan — NOT Cambridge Analytica and NOT the whistleblower Christopher Wylie — made a ‘Test Your Personality’ app, helped to promote it by paying people $1 to install it on Amazon’s Mechanical Turk crowdsourcing site, and used the permissions granted to harvest profile data. 270,000 users installed the app, so you might expect that 270,000 profiles were collected but the app actually collected data from 50 million profiles.

50 million?!?

Yes. You see back in the heady days of 2014, Facebook had a feature called ‘friends permission’ that allowed developers to access the profiles of not only the person who installed their app but all their friends too. The only way to prevent this from happening was to have toggled a privacy setting, which few Facebook users even knew existed (here is a blog from 2012 explaining how to do so). The friends permission feature is how Kogan multiplied 270,000 permissions into 50 million profiles worth of data.

That Facebook users were having their data shared by their friends without their knowledge or permission was a serious concern that many privacy advocates noted at the time. So in 2015, facing growing criticism and pressure, Facebook removed the feature citing a desire to give their users “more control”. This decision caused consternation amongst developers as the ability to access friends profiles was extremely popular (see the comments under this 2014 post from Facebook announcing the changes). Sandy Parakilas, an ex-Facebook manager, reported to Bloomberg that “tens or maybe even hundreds of thousands of developers” were making use of the feature before it was discontinued.

To review, there are two key points to remember at this point:

  1. None of what I just described involves ‘hacking’ Facebook or exploiting a bug. Instead, it all revolves around the use of a feature that Facebook provided to all developers and (at least) tens of thousands took advantage off.
  2. The data collected was not internal Facebook data. It was data that developers scraped from the profiles of people who downloaded their apps (and their friends). Facebook has a lot more data on users than is publically available and it has it for everyone who uses their platform. No-one but Facebook has access to that data. This is a point that almost all the journalists involved seem unable to grasp, instead they repeatedly equate ‘Facebook’s internal data’ to ‘data scraped from Facebook profiles using a third party app’. But these are VERY different things.

The importance of this second point becomes apparent when you read exchanges like this one:

Simon Milner, Facebook’s UK policy director, when asked if Cambridge Analytica had Facebook data, told MPs: “No. They may have lots of data, but it will not be Facebook user data. It may be data about people who are on Facebook that they have gathered themselves, but it is not data that we have provided.

This exchange is being reported as evidence that Facebook lied to politicians about its relationship with Cambridge Analytica. But when you understand the difference between Facebook’s internal data and data collected on Facebook by outside developers it is clear that what Facebook’s policy director is saying is very likely true.

So where does Cambridge Analytica come in to the story?

Well, they paid Kogan to collect those 50 million profiles. Whose idea that was originally is currently a matter of ‘he said, she said’. Kogan says Cambridge Analytica approached him and Cambridge Analytica says Kogan came to them. Whatever the case may be, this is the part of the story where there was an actual breach; not of Facebook’s internal data but of Facebook’s data sharing policies. Developers were permitted to collect all the user data they wanted from their apps, but what they were not allowed to do — even back in 2014 — was take that data and sell it to a third party.

And yet, regardless of Facebook’s official policies, it seems that they did not expend much effort to police their developers or track how the data they collected was being used. This is likely why, when Facebook first uncovered that Kogan had sold some data to Cambridge Analytica in 2015, they were content to receive written confirmation from both that the data had been deleted.

The fact that there were (at minimum) tens of thousands of developers with access to such information meant that it was inevitable that data harvested on Facebook was being sold, or otherwise provided, to a wide array of third parties. Again, the disgruntled ex Facebook manager confirmed as much:

Asked what kind of control Facebook had over the data given to outside developers, he replied: “Zero. Absolutely none. Once the data left Facebook servers there was not any control, and there was no insight into what was going on. Parakilas said he “always assumed there was something of a black market” for Facebook data that had been passed to external developers.

So given how prevalent Facebook data harvesting was and that there are many developers with more than 270,000 users to harvest from, why is Cambridge Analytica receiving so much media attention?

The answer to this seems to primarily how journalists, particularly Carole Cadwalladr at the Observer, have framed the story. The majority of coverage has pushed two angles. First, that a whistleblower from Cambridge Analytica revealed ‘a major breach’ of Facebook’s data, an issue covered above, and second, that this ‘breach’ was linked to the success of Trump’s presidential campaign.

Chris Wylie the mastermind who ‘hacked’ Facebook…

This second angle is as dubious as the first and relies heavily on bombastic claims made by Chris Wylie—the pink haired ex-Cambridge Analytica employee pictured above. Carole Cadwalladr, who spent years on the story, has explained in various interviews that she approached the story not as an investigative journalist but as a features writer. This meant that she focused on delving into ‘the human side of the story’, or put another way- Chris Wylie. There are pros and cons to such an approach but the biggest drawback is how invested and reliant it made her and subsequent coverage in accepting Wylie’s narrative, which just so happens to paint him as a young mastermind at the center of global political conspiracies.

Cadwalladr fully endorses Wylie’s presentation and fawningly describes him as: “clever, funny, bitchy, profound, intellectually ravenous, compelling” … “impossibly young” … “His career trajectory has been, like most aspects of his life so far, extraordinary, preposterous, implausible” … “Wylie lives for ideas. He speaks 19 to the dozen for hours at a time” … “when Wylie turns the full force of his attention to something — his strategic brain, his attention to detail, his ability to plan 12 moves ahead — it is sometimes slightly terrifying to behold” … “his suite of extraordinary talents include the kind of high-level political skills that makes House of Cards look like The Great British Bake Off.”

Wow… what a guy.

Cadwalladr’s person-focused approach might make for more accessible articles but it also helps to obscure the relevant technical details in favour of providing sensationalist quotes and personal anecdotes from Wylie and his friends and coworkers. Presenting these kinds of details could be insightful, if they were subjected to sufficient critical examination but this rarely occurs. Cadwalladr, instead, seems to have entirely bought into Wylie’s narrative: “by the time I met him in person, I’d already been talking to him on a daily basis for hours at a time.”

So let’s address the oversight and take a bit more of a critical look at what Wylie’s narrative claims:

  • That Steve Bannon wanted to weaponize big data… No difficulty believing.
  • That Cambridge Analytica claims to be able to provide effective tools for psychological targeting and manipulation… Certainly true.
  • That Chris Wylie, himself, was involved with some shady business and views himself as partly responsible… Sure.
  • That the self-promotional claims of Cambridge Analytica actually equate to how effective the services they provide are… Hmmmm.

This last point is the most important and yet it is also the one lacking almost any supporting evidence.

The temptation might be to point to Trump’s surprising victory but there are a lot of confounding factors there. Trump won, yes. But he won against the most unpopular Democratic candidate in modern history, who was vying for a third Democratic term (something which had not been achieved since the 1940s). Furthermore, he won by a very slim margin and actually lost the popular vote.

Alexander Nix, CEO of Cambridge Analytica standing in front of lots of impressive graphs!

Could all that just be evidence of how precise Cambridge Analytica’s psychological targeting was? Maybe, but we start to run into the perils of dealing with an unfalsifiable hypothesis. A better approach would be to look at Cambridge Analytica’s relative record of success and failure. Unfortunately, we do not have access to their full client list but we do know that when they first rose to prominence they were working for the Ted Cruz presidential campaign. That’s right, Ted Cruz — the Republican senator who was crushed by Trump in the Republican primaries, despite having the power of Cambridge Analytica at his command. I am not the first to notice this apparent contradiction, Martin Robbins made the same point on Little Atoms last year:

So the story of the Republican primaries is actually that Cambridge Analytica’s flashy data science team got beaten by a dude with a thousand-dollar website. To turn that into this breathtaking story of an unbeatable voodoo-science outfit, powering Trump inexorably to victory, is quite a stretch. Who else have they even worked for? Without a list of clients it’s very easy to cherry-pick the winners.

The techniques that Cambridge Analytica purport to use involve using social network data to build algorithms that can accurately predict what kind of messages will be effective given an individual’s personality and psychology. This is what the stories mean when they talk about using psychographics to micro-target voters. But a lot of the claims being made about the effectiveness of such techniques is widely exaggerated. Kogan — the Cambridge academic at the heart of the controversy — has made similar arguments. He claims that he is being scapegoated and argues that the personality profiles he gathered turned out to not be particularly useful for making the predictions needed for micro-targeting:

In fact, from our subsequent research on the topic,” he wrote, “we found out that the predictions we gave SCL were 6 times more likely to get all 5 of a person’s personality traits wrong as it was to get them all correct. In short, even if the data was used by a campaign for micro-targeting, it could realistically only hurt their efforts.

Kogan is hardly an impartial source but his claim accords with various studies that have shown less than stellar results for nefarious social media manipulation. Take, for instance, the controversial Facebook ‘mind control’ study, which I’ve heard several journalists reference in recent days. What always seems to be missing from reporting on this study is just how underwhelming it was.

Facebook ran an experiment on almost 689,000 users in which it tweaked the algorithm running their news feed to display slightly more or slightly less status updates from friends that contained positive or negative words. As any researcher knows, with such a large sample you are guaranteed to find statistically significant differences between groups. A more important criteria with such massive groups is how large the effect observed was. In the Facebook study this equated to a truly terrifying difference: those who saw less negative updates used around 0.05 more positive words out of every 100 words in their status updates, whereas those who saw less positive updates used around 1 less positive word per 100 in their status updates. That’s right Facebook might have been able to manipulate people to use around 1 less positive word for every 100 words in their updates. It would be wrong to paint this as Facebook being powerless, bigger interventions would have bigger effects, but it is important to keep things in perspective.

Note the starting point of the y-axis. There is a reason it isn’t 0.

The real story then is not that Kogan, Wylie, and Cambridge Analytica developed some incredibly high tech ‘hack’ of Facebook. It is that, aside from Kogan’s data selling, they used methods that were common place and permitted by Facebook prior to 2015. Cambridge Analytica has since the story broke been outed as a rather obnoxious, unethical company- at least in how it promotes itself to potential clients. But the majority of what is being reported in the media about its manipulative power is just an uncritical regurgitation of Cambridge Analytica (and Chris Wylie’s) self-promotional claims. The problem is that there is little evidence that the company can do what it claims and plenty of evidence that it is not as effective as it likes to pretend; see the fact that Ted Cruz is not currently president.

No one is totally immune to marketing or political messaging but there is little evidence that Cambridge Analytica is better than other similar PR or political canvassing companies at targeting voters. Political targeting and disinformation campaigns, including those promoted by Russia, certainly had an impact on recent elections but were they the critical factor? Did they have a bigger impact than Comey announcing he was ‘reopening’ the Hillary email investigation the week before the US election? Or Brexiteers claiming that £250 million was being stolen from the NHS by the EU every week? Colour me skeptical.

To be crystal clear, I’m not arguing that Cambridge Analytica and Kogan were innocent. At the very least, it is clear they were doing things that were contrary to Facebook’s data sharing policies. And similarly Facebook seems to have been altogether too cavalier with permitting developers to access its users’ private data.

What I am arguing is that Cambridge Analytica are not the puppet masters they are being widely portrayed as. If anything they are much more akin to Donald Trump; making widely exaggerated claims about their abilities and getting lots of attention as a result.

20 comments

  1. The accuracy in your post just earned you a long-term follow. Would you mind if in the near future I feature your website on Sutter Media as a place I’d recommend 2 my subscribers?

    Like

  2. So glad I read this. Well written and persuasive analysis and an effective antidote to all the hype in the MSM. Will be following you from now on. Well done.

    Like

  3. God. Finally!! This is the MOST cogent and informative article I’ve yet seen on this whole mess. Thank you! From the very start Wylie struck me as a self-aggrandising blowhard, if not outright fantasist at times. I find it beyond dismaying that he’s been allowed to enter Parliament and chuck around even more claims and assertions, and every single one is just uncritically lapped up as gospel truth. And these claims cause REAL damage. Even today, he’s posted up maps and research from SCL India in response to implied claims he made yesterday in (shamefully!) our Parliament. Even THOUGH the information is freely available from SCL’s website! And even THOUGH nothing about it is untoward! Regardless, and without any check on him at all, it’s all presented with the veneer of dark and evil forces at work, and as absolute truth and ‘support’ for all his implied and/or expressly stated allegations. No balance, no monitoring, not even a modicum of attempted rationality.

    This counts DOUBLE for the way it’s all been reported on, too. There appear to be major problems with how information is collected and dispersed from social media; that’s now a given. What is also a major problem is how utterly tribal and hysterical to the point they’ll report anything, so long as it fits a particular narrative, the press now are. Cadwalladr is just utterly out of control. I get she thinks this is a Snowden-level ‘investigation’ but it really isn’t, and every day she causes huge damage to reputations without, it appears, no checks on her at all. It really does boggle the mind. How the Guardian can keep getting away with it actually staggers me. There has been a big debate regarding the law in this area, and a lot of hand-wringing from the Law Commission, but really, the time has now come to start dealing with this properly.

    Social media needs cleaning up. Facebook et al need to either properly regulate themselves, or they need to have it done for them. The laws on defamation need buffering and redrafting to bring them into line with how easily libelling can occur now on a daily basis and which fundamentally, and by increments, undermines the law overall. And Wylie needs to come under some sort of proper scrutiny once and for all; and this includes cleaning up his Wikipedia article, too.

    Thank you again for this wonderful article. I found it via a link from @wallaceme. I have now bookmarked. Thank you very much again. I FINALLY feel I know what’s going on.

    Liked by 1 person

  4. ” … The data collected was scraped from Facebook user profiles…”
    ” … The data collected was not internal Facebook data…”
    ‘Not sure how you square that circle. The data of the 50M (friends) users was not collected directly from the Kogan’s PsychoApp input from users, but collected by accessing FB’s friends’ personal data records, surely? What does “FB internal data” mean exactly?

    Like

  5. ” … uncritically lapped up as gospel truth … ” That’s what happens when you distrust the advice of experts – either that or – you don’t want anyone to know what’s really going on so you decry their efforts to enlighten you. I also don’t admire the brash yank’s condescending demeanour, but, hey, don’t shoot the messenger.

    Like

  6. ” … it is not as effective as it likes to pretend; see the fact that Ted Cruz is not currently president…” … except that shady elites still appear to pour millions into a ineffective premise? And I’m certain the military psyops version is a little more advanced than these amateurs.

    Like

  7. We seem to be in an era where any preposterous story that can be manipulated by someone with a political agenda – here it is anti Trump and Brexit remoaning, gets massive air time without proper questioning and therefore gets lapped up by the masses.

    Whilst this is to be expected from people like Carole and The Guardian, why are proper news outlets like the BBC not doing what they should be doing and looking impartially at both sides. Why is the truth only to be found on obscure blogs (no offence).

    The latest example is all the nonsense about gender pay gaps. Shock horror! an airline pays its pilots a lot more than its cabin crew. Whilst I would like to see more fully qualified and competent female pilots and I am sure we will in time without any interference, the fact is about 97% of pilots across all airlines are male and it is unlikely it will ever be 50%. The overwhelming majority of cabin crew are female and without some positive action to only recruit men and discriminate against women, that is not going to change.

    Like

  8. I’m not sure the BBC have been so bad. Carole is complaining about their unwillingness to cover the story in greater depth. As per the rest, well I think there is a lack of knowledge about statistics which complicates things and then there is also a reaction to the very real exploitation and prejudice that various groups have faced. It doesn’t justify all the bad reporting but it does help contextualise it.

    Like

  9. I’m drawing a distinction between what developers had access to and what Facebook has access to internally which is much more substantial. This is not to minimize what Kogan and other developers were able to access, but it certainly pales in comparison to what Facebook collects.

    Like

  10. Re the “hack” issue. Firstly, the app accessed data belonging to data user’s friends but these friends didn’t give their consent about their data being used and weren’t even made aware. This is contrary to EU data protection laws and its seriousness not reflected in the article above. Secondly, when Facebook’s UK policy director, when asked if Cambridge Analytica had Facebook data, told MPs: “No.. it will not be Facebook user data. It may be data about people who are on Facebook that they have gathered themselves, but it is not data that we have provided”, he wasn’t being entirely straight. Kogan got the data because FB allowed him access to data incl data belonging to the app user’s friends in the absence of FB seeking the consent of those data subjects or even making them aware. Thirdly, FB failed to have adequate provisions in the contract between FB and Kogan (or failed to enforce those provisions) which would have prevented Kogan from using the data FB allowed him to access for any other purpose other than the one agreed between FB and Kogan, namely academic purposes. Any contravention of those provisions should have resulted in a financial penalty. Fourthly, FB knew about this in 2015 but failed to notify the data subjects whose data had been disclosed by FB w/o their consent and then sold to a 3rd party. This is a huge failing on FB’s behalf, the seriousness of which is not conveyed in the article above.

    Like

  11. Marsha,

    1) I specifically mention that it “was a serious concern” that profiles of friends were accessed without their explicit permission. I think this is an abuse of users trust, regardless of whether Facebook technically broke certain countries data protection laws. I suspect they would argue that by giving users the option to opt out and including acknowledgments in their terms of service that they technically remained within the letter of the law but many privacy advocates would disagree.

    2) Did you read the full text of the exchange with Milner? It is clear that he is not entirely straight but also that at the point quoted he is emphasising the distinction between Facebook’s own internal data and what it provides developers to access. You can see this because in a follow up exchange he is directly asked:

    “Christian Matheson: We may well do in the future and will see how that inquiry progresses. Is it the case that third-party users—app users, or
    whatever we might call it—can ask for a Facebook user’s data and then pull that data off Facebook and bank it?

    Simon Milner: Yes, that is part of the platform policies that we have.”

    3) If you don’t know what’s in the contract how do you know what provisions it provided for a financial penalty? I would suspect that all it provided was Facebook the right to rescind access and require you destroy/return data. The point (made in my article) that such data was readily available to tens of thousands of developers probably did not make policing the system a priority. I don’t argue that Facebook’s approach was right, just that we should understand the context and acknowledge that they did make changes to how friend permissions work back in 2015.

    4) See point 3 above. I doubt Facebook saw this as a serious issue, aside from the press coverage it attracted, because users were likely having their data ‘breached’ by thousands of other companies. Again I’m not defending their actions just explaining the context. People are getting too worked up about Cambridge Analytica when they should be worked up about the broader online data issues but here they should also recognise how Facebook’s policies have changed in that intervening years and what Facebook has agreed to do to comply with GDPR. Sensationalist outrage doesn’t help, it just leads to rampant misrepresentations, misunderstandings, and usually descends into political partisanship.

    Like

  12. Hi Chris,

    I arrived here following the link on Dominic Cummings’ blog. The central point of your argument is that CA’s platform was not the magic mind-control tool they claimed it to be.

    However, I think you need to weigh the following:
    1. The very real possibility that Kogan is a Russian agent. Like Novichok, the Russians wouldn’t say “yeah, fair cop guv'”. Yet you are happy to take his statement at face value. Given the obvious intelligence of your writing, this is not credible.
    2. If the CA* claims were hokum, the possibility that their platform nevertheless created an effect, but not because of its “magic sauce”. Instead, (1) If CA were able to match FB profiles to where you lived (roughly), then they can target you with potentially effective locally relevant content, e.g. “immigrants taking your housing in X estate, taking places at Y school etc”, (2) FB allows targeting of limited resources, so no “spray and pray”. Basic segmentation without the psycho social profiling is a powerful enough tool in and of itself. Hence FB’s huge market value.

    Essentially, I think you’ve created a strawman argument to distract from the following:
    1. There is clear evidence of coordination in the Leave campaign in an effort to frustrate the spending cap.
    2. The question as to why Leave pumped so much money through FB: clearly they thought it was money well spent.
    3. The role of Robert Mercer in capitalising these businesses in order that campaigning services could be sold at an undervalue.

    I note that Cummings has gone terribly quiet. Either he’s been told by his lawyers to shut it, or he’s practising picking up shower soap with his toes.

    *I use CA to refer to the SCL/CA/AIQ cabal. I think evidence of recent days clearly demonstrates they are, behind the corporate veil, one organisation.

    Like

  13. Brian,

    1. I don’t take his statements at face value. The article explicitly indicates that he is not a neutral party. As far as the Russian links go, what I have seen is far from conclusive. It is possible that he is a Russian agent, but the more likely account from the evidence I have seen is that he is a fairly normal academic with below average moral concerns.
    2. Yes, but that is my point. Facebook already provides lots of targeted ad options, including targeting people from specific locations. So you do not need to posit some costly and unproven psychographic profiling as being the means of delivery, bog standard political targeting will do essentially the same job. That such campaigns are facilitated by social networks is certainly true and it is something we should be concerned about and governments should be regulating.

    As far as the ‘strawman’ arguments go…

    1. From what I’ve seen there is growing evidence but it will be up to the relative regulatory bodies to judge whether the standard of evidence for coordination is met. I remain somewhat sceptical that this furor will result in anything but some symbolic rebuking.
    2. Sure, social networking advertising was a priority medium for their political targeting. I’m not disputing this, here or in the original piece.
    3. Robert Mercer has clearly been incredibly influential in promoting these sources, providing them with funding, and other nefarious activities. Again, I do note dispute this. The part I would dispute is that this means the companies he promotes are incredibly effective, or at least any more effective than those used by opposing campaigns.

    The Leave/Remain campaigns were mired with special interest groups seeking to influence the outcome. You will see accusations and counter accusations flung around endlessly both as a result of the CA fallout and just more generally because of how strongly people feel about the issue. I am, personally, strongly in the Remain camp and find a lot of the political messaging and tactics of the Leave campaign engaged to be vile, but I also recognise that the Remain campaign was not blameless, and that the (eventual) support from Cameron has enabled criticisms to be levelled at some support the Remain campaign received (e.g. publishing those leaflets). This isn’t me asserting a false equivalence either, I still think overall the Leave campaign engaged in much more dishonest tactics but because of the murky context, I seriously doubt the current revelations will have any real impact except to further split those who fall in either ideological camp.

    Like

Leave a comment