Facebook’s Cambridge Analytica problems are nothing compared to what’s coming for all of online publishing

Let’s start with Facebook’s Surveillance Machine, by Zeynep Tufekci in last Monday’s New York Times. Among other things (all correct), Zeynep explains that “Facebook makes money, in other words, by profiling us and then selling our attention to advertisers, political actors and others. These are Facebook’s true customers, whom it works hard to please.”

Irony Alert: the same is true for the Times, along with every other publication that lives off adtech: tracking-based advertising. These pubs don’t just open the kimonos of their readers. They bring readers’ bare digital necks to vampires ravenous for the blood of personal data, all for the purpose of aiming “interest-based” advertising at those same readers, wherever those readers’ eyeballs may appear—or reappear in the case of “retargeted” advertising.

With no control by readers (beyond tracking protection which relatively few know how to use, and for which there is no one approach, standard or experience), and no blood valving by the publishers who bare those readers’ necks, who knows what the hell actually happens to the data?

Answer: nobody can, because the whole adtech “ecosystem” is a four-dimensional shell game with hundreds of players

or, in the case of “martech,” thousands:

For one among many views of what’s going on, here’s a compressed screen shot of what Privacy Badger showed going on in my browser behind Zeynep’s op-ed in the Times:

[Added later…] @ehsanakhgari tweets pointage to WhoTracksMe’s page on the NYTimes, which shows this:

And here’s more irony: a screen shot of the home page of RedMorph, another privacy protection extension:

That quote is from Free Tools to Keep Those Creepy Online Ads From Watching You, by Brian X. Chen and Natasha Singer, and published on 17 February 2016 in the Times.

The same irony applies to countless other correct and important reporting on the Facebook/Cambridge Analytica mess by other writers and pubs. Take, for example, Cambridge Analytica, Facebook, and the Revelations of Open Secrets, by Sue Halpern in yesterday’s New Yorker. Here’s what RedMorph shows going on behind that piece:

Note that I have the data leak toward Facebook.net blocked by default.

Here’s a view through RedMorph’s controller pop-down:

And here’s what happens when I turn off “Block Trackers and Content”:

By the way, I want to make clear that Zeynep, Brian, Natasha and Sue are all innocents here, thanks both to the “Chinese wall” between the editorial and publishing functions of the Times, and the simple fact that the route any ad takes between advertiser and reader through any number of adtech intermediaries is akin to ball falling through a pinball machine. Refresh your page while reading any of those pieces and you’ll see a different set of ads, no doubt aimed by automata guessing that you, personally, should be “impressed” by those ads. (They’ll count as “impressions” whether you are or not.)

Now…

What will happen when the Times, the New Yorker and other pubs own up to the simple fact that they are just as guilty as Facebook of leaking their readers’ data to other parties, for—in many if not most cases—God knows what purposes besides “interest-based” advertising? And what happens when the EU comes down on them too? It’s game-on after 25 May, when the EU can start fining violators of the General Data Protection Regulation (GDPR). Key fact: the GDPR protects the data blood of what they call “EU data subjects” wherever those subjects’ necks are exposed in borderless digital world.

To explain more about how this works, here is the (lightly edited) text of a tweet thread posted this morning by @JohnnyRyan of PageFair:

Facebook left its API wide open, and had no control over personal data once those data left Facebook.

But there is a wider story coming: (thread…)

Every single big website in the world is leaking data in a similar way, through “RTB bid requests” for online behavioural advertising #adtech.

Every time an ad loads on a website, the site sends the visitor’s IP address (indicating physical location), the URL they are looking at, and details about their device, to hundreds -often thousands- of companies. Here is a graphic that shows the process.

The website does this to let these companies “bid” to show their ad to this visitor. Here is a video of how the system works. In Europe this accounts for about a quarter of publishers’ gross revenue.

Once these personal data leave the publisher, via “bid request”, the publisher has no control over what happens next. I repeat that: personal data are routinely sent, every time a page loads, to hundreds/thousands of companies, with no control over what happens to them.

This means that every person, and what they look at online, is routinely profiled by companies that receive these data from the websites they visit. Where possible, these data and combined with offline data. These profiles are built up in “DMPs”.

Many of these DMPs (data management platforms) are owned by data brokers. (Side note: The FTC’s 2014 report on data brokers is shocking. See https://www.ftc.gov/reports/data-brokers-call-transparency-accountability-report-federal-trade-commission-may-2014. There is no functional difference between an #adtech DMP and Cambridge Analytica.

—Terrell McSweeny, Julie Brill and EDPS

None of this will be legal under the #GDPR. (See one reason why at https://t.co/HXOQ5gb4dL). Publishers and brands need to take care to stop using personal data in the RTB system. Data connections to sites (and apps) have to be carefully controlled by publishers.

So far, #adtech’s trade body has been content to cover over this wholesale personal data leakage with meaningless gestures that purport to address the #GDPR (see my note on @IABEurope current actions here: https://t.co/FDKBjVxqBs). It is time for a more practical position.

And advertisers, who pay for all of this, must start to demand that safe, non-personal data take over in online RTB targeting. RTB works without personal data. Brands need to demand this to protect themselves – and all Internet users too. @dwheld @stephan_lo @BobLiodice

Websites need to control
1. which data they release in to the RTB system
2. whether ads render directly in visitors’ browsers (where DSPs JavaScript can drop trackers)
3. what 3rd parties get to be on their page
@jason_kint @epc_angela @vincentpeyregne @earljwilkinson 11/12

Lets work together to fix this. 12/12

Those last three recommendations are all good, but they also assume that websites, advertisers and their third party agents are the ones with the power to do something. Not readers.

But there’s lots readers will be able to do. More about that shortly. Meanwhile, publishers can get right with readers by dropping #adtech and go back to publishing the kind of high-value brand advertising they’ve run since forever in the physical world.

That advertising, as Bob Hoffman (@adcontrarian) and Don Marti (@dmarti) have been making clear for years, is actually worth a helluva lot more than adtech, because it delivers clear creative and economic signals and comes with no cognitive overhead (for example, wondering where the hell an ad comes from and what it’s doing right now).

As I explain here, “Real advertising wants to be in a publication because it values the publication’s journalism and readership” while “adtech wants to push ads at readers anywhere it can find them.”

Going back to real advertising is the easiest fix in the world, but so far it’s nearly unthinkable because we’ve been defaulted for more than twenty years to an asymmetric power relationship between readers and publishers called client-server. I’ve been told that client-server was chosen as the name for this relationship because “slave-master” didn’t sound so good; but I think the best way to visualize it is calf-cow:

As I put it at that link (way back in 2012), Client-server, by design, subordinates visitors to websites. It does this by putting nearly all responsibility on the server side, so visitors are just users or consumers, rather than participants with equal power and shared responsibility in a truly two-way relationship between equals.

It doesn’t have to be that way. Beneath the Web, the Net’s TCP/IP protocol—the gravity that holds us all together in cyberspace—remains no less peer-to-peer and end-to-end than it was in the first place. Meaning there is nothing to the Net that prevents each of us from having plenty of power on our own.

On the Net, we don’t need to be slaves, cattle or blood bags. We can be human. In legal terms, we can operate as first parties rather than second ones. In other words, the sites of the world can click “agree” to our terms, rather than the other way around.

Customer Commons is working on exactly those terms. The first publication to agree to readers terms is Linux Journal, where I am now the editor-in-chief. The first of those terms will say “just show me ads not based on tracking me,” and is hashtagged #DoNotByte.

In Help Us Cure Online Publishing of Its Addiction to Personal Data, I explain how this models the way advertising ought to be done: by the grace of readers, with no spying.

Obeying readers’ terms also carries no risk of violating privacy laws, because every pub will have contracts with its readers to do the right thing. This is totally do-able. Read that last link to see how.

As I say there, we need help. Linux Journal still has a small staff, and Customer Commons (a California-based 501(c)(3) nonprofit) so far consists of five board members. What it aims to be is a worldwide organization of customers, as well as the place where terms we proffer can live, much as Creative Commons is where personal copyright licenses live. (Customer Commons is modeled on Creative Commons. Hats off to the Berkman Klein Center for helping bring both into the world.)

I’m also hoping other publishers, once they realize that they are no less a part of the surveillance economy than Facebook and Cambridge Analytica, will help out too.

[Later…] Not long after this post went up I talked about these topics on the Gillmor Gang. Here’s the video, plus related links.

I think the best push-back I got there came from Esteban Kolsky, (@ekolsky) who (as I recall anyway) saw less than full moral equivalence between what Facebook and Cambridge Analytica did to screw with democracy and what the New York Times and other ad-supported pubs do by baring the necks of their readers to dozens of data vampires.

He’s right that they’re not equivalent, any more than apples and oranges are equivalent. The sins are different; but they are still sins, just as apples and oranges are still both fruit. Exposing readers to data vampires is simply wrong on its face, and we need to fix it. That it’s normative in the extreme is no excuse. Nor is the fact that it makes money. There are morally uncompromised ways to make money with advertising, and those are still available.

Another push-back is the claim by many adtech third parties that the personal data blood they suck is anonymized. While that may be so, correlation is still possible. See Study: Your anonymous web browsing isn’t as anonymous as you think, by Barry Levine (@xBarryLevine) in Martech Today, which cites De-anonymizing Web Browsing Data with Social Networks, a study by Jessica Su (@jessicatsu), Ansh Shukla (@__anshukla__) and Sharad Goel (@5harad)
of Stanford and Arvind Narayanan (@random_walker) of Princeton.

(Note: Facebook and Google follow logged-in users by name. They also account for most of the adtech business.)

One commenter below noted that this blog as well carries six trackers (most of which I block).. Here is how those look on Ghostery:

So let’s fix this thing.

[Later still…] Lots of comments in Hacker News as well.

[Later again (8 April 2018)…] About the comments below (60+ so far): the version of commenting used by this blog doesn’t support threading. If it did, my responses to comments would appear below each one. Alas, some not only appear out of sequence, but others don’t appear at all. I don’t know why, but I’m trying to find out. Meanwhile, apologies.

72 comments

  1. Michael Markman’s avatar

    What sets FB apart from other ad-supported data-rapers is its mechanism for propagating and amplifying manipulative messages by means of friend-to-friend sharing, liking, and commenting.

  2. Paul Grange’s avatar

    “By the way, I want to make clear that Zeynep, Brian and Natasha are all innocents here, thanks to the “Chinese wall” between the editorial and publishing functions of the Times.”

    That is utter nonsense. Check your six. You are eating bad lies.

  3. AJ Fish’s avatar

    Excited for Linux Journal with Customer Commons. Our lefty hipster bookstore wasn’t carrying it and I requested a copy. I’ll subscribe snail mail.

    I worked in adtech as recently as 2010 and it was made very clear we were not to record peoples’ names, ages, addresses, as all of that was irrelevant to our ad targeting. If someone bought baby clothes the auctioning system threw up an ad for say, Tide. We put in some AI and propensity models since consumers grow immune to ads done the same way. So when people were egging me to join facebook, and it was required to enter my birthdate to join, I smelled something funny right away.

    Serendipitous ads are really fun now. Our cable was on the fritz we bought an HDTV antenna and recently discovered Geico commercials and a strange dating site farmersonly.com? There’s a whole world out there. Escaped from our own Truman show.

  4. Doc Searls’s avatar

    Paul, the old “Chinese wall” was one where journalists intentionally ignored what was easy for them to know, namely what companies advertised with their employer. With the new “Chinese wall” they can’t. Nobody knows exactly. Reader A sees ads for cruise ships and reader B sees ads for the last thing they looked at on Amazon. Both refresh their pages and see completely different ads for other stuff. The whole adtech system is a four-dimensional shell game that nobody—even those in it—can fully comprehend.

    It is true that writers should know about that shell game, however. I look forward to more of them reporting on it.

  5. Mario’s avatar

    I do not believe that the GDPR will stop this.

    As long as the user gives consent, tracking can continue.

    I have tested this under the current regulation. Even a government institution knows how to play: they asked my consent to use my email for “exchanges” regarding Unemployment insurance.
    What I understood is that they would use my email whenever they needed to contact me about a particular situation with regards to myself or my company.

    However, the first and only use of this email address was to send me a generic mailing that was signed with the notification “to not reply to this mail”.

    “Exchanges” implies that there is a bidirectional communication according to a renowed dictionnary in my local language.

    Yet designated officer the public institution replied that their mail is allowed as it communicated within the bounds of the institution’s mission.

    This is one of the examples that makes me conclude that data collection will quietly continue as long as it fits the collectionner’s interpretation of the law.

  6. Doc Searls’s avatar

    Thanks, Mario.

    We can’t expect regulators and collectioneers alone to get us out of the pickle we’ve been in ever since we settled on client-server as the defaulted way individuals and institutions interact on the Internet, while also doing nothing to replace contracts of adhesion (those one-sided things we click “accept” to) that have been the norm in big business ever since industry won the industrial revolution. We also need developers to come up with ways that allow individuals to protect their own privacy while also operating as first parties in their legal dealings with the institutions of the world. The Internet was actually designed to support that, and the arrival of the GDPR and other privacy regulation has improved the willingness of countless companies to start dealing with individuals in new ways. The changes we’re working on won’t all happen at once or everywhere, but they will happen, and I invite everyone who wants to do more than complain about bad acting by big entities to join our work.

  7. Mario’s avatar

    I do more than complaining.
    I remind my customers about the obligations and I protect personal data and access to the systems I build (excessively according to some customers), I no longer store passwords in my navigator that happened to store them online, I have distinct passwords for each site, etc.

    I “complain” because as an entrepreneur I have a better view on how companies are acting than the general public, because of my recent experience with the current legislation, and because of what I see is happening.

    1.
    In order to reduce the amount of SPAM I received, I have also been exercising my rights under the current legislation by requesting information about the personal use of my data. These are essentially the same as those that are annonced in the GPRD (http://www.privacy-regulation.eu/en/15.htm ).

    My conclusion is that those that are not very prepared are also those that are most willing to fullfill their obligations. They respond kindly and promptly provide any information they can about the origin of the data. I have discovered that several blindly trust the seller of the data – i.e. that the seller obtained my agreement for this purpose.

    Those that are prepared, are rarely prompt and rarely respond. And – a few exceptions aside – they are not actually kind and do anything to deter their obligation to respond. When they respond, they only respond to part of the questions, generally indicating that the personal data is now deleted without indicating how and why and where it was collected, who they shared it with and what they actually collected. There is no reason that this will not continue when the GPRD is active.

    2.
    In another particular case, the company seems to perfectly respect all the rules: the company publishes detailed information about the data being collected, who they share it with, how it is used, etc. Everything is public except the actual data they have on you.
    “Unfortunately” this company failed to respect its own public rules. They shared the data with another entity that they should not have shared it with. I know because when I share my email, I create a new one specific to the company I share it with.
    When asked about the incident, there is essentially silence. The company acknowledged my request, but did not anwser yet – they still have 1 month to do so.

    3.
    When discussing with other entrepreneurs about it, the conclusion is that it is not so difficult to respect the GPRD: the company has to ask for permission, and provide tracebility about the actions you performed with regards to the personal data.

    4.
    Also, in France, the public institutions will not have to pay fines if they do not respect the GPRD.
    They only risk several types of warnings, a limitation or prohibition to use the data, and/or the withdrawal of an authorisation (http://www.lagazettedescommunes.com/554866/donnees-personnelles-le-senat-inscrit-les-collectivites-dans-le-texte/ ).
    While this avoids that tax payers eventually pay for the fines that the Town hall would have to pay for a few, there is no real incentive for the Town hall to abide by the rules.

    5.
    What matters in the end is this: will the complaint of the individual with the supervisory authority have an effect?
    Filing a receivable complaint is in itself already “difficult”. How many can make it solid enough, how many will actually care to file a complaint?

  8. matt’s avatar

    I’m blocking 6 trackers on your blog right now, including Google and Facebook. Talk is cheap.

  9. Mark Andrews’s avatar

    That’s why I use the Tor Browser (which has first party isolation, anti-fingerprinting defense, and stream isolation–which means that I get a different circuit for each different first party website).

  10. Doc Searls’s avatar

    Mario,

    Those are all good and helpful points. Thanks!

  11. Paul M’s avatar

    the backlash against tracking has been predicated for years.
    it’s possible to have effective advertising without using them, e.g. see Grapeshot based in Cambridge UK – https://www.grapeshot.com/

  12. Jason Bennett’s avatar

    Your first screencap references the New Yorker, not the Times.

  13. Doc Searls’s avatar

    Thanks, Jason. Fixed. I couldn’t get RedMorph to give me the same screen shot I had used originally (in the first draft here), so I went with PrivacyBadger instead.

    All the privacy monitoring and protection extensions yield different results and have different controls. And the sites are often different when one visits them at different times. But that’s another point for another post.

  14. metasj’s avatar

    Wonderful piece. Passed around.

    I also have a blogs.law blog, and it only has WordPress trackers (which I’m not sure how to turn off). Curious about where the LinkedIn, Facebook, and Pinterest trackers here come from — attached to this skin?

  15. oalrus’s avatar

    As long as we want to have ‘free’ web content and web services this will continue to be a problem. If people want free stuff then payment will have to be obtained somewhere, and that payment is going to be information and attention.
    If you don’t want to have your data collected and your behavior modified through targeted marketing you need to look for economic models that don’t rely on advertising.

    My proposal is oalrus.com. Trying to use behavioral economics to overcome the allure of free and make it friction-less and pleasurable reward creators. Check it out and sign up to encourage further development.

  16. adnauseam-user’s avatar

    And one way to push publishers faster is to use AdNauseam (http://adnauseam.io) …

  17. Qeote’s avatar

    I have seen newspaper reports that an app created in the name of the Indian head of government has been sucking up massive amounts of personal data and passing it on to an American firm without the victim’s consent. Caught out, consent has been included in the privacy policy.

  18. Doc Searls’s avatar

    Why no adnauseum on Safari?

  19. Doc Searls’s avatar

    Qeote, that would be Aadhaar. It’s worth reading the Wikipedia unpacking of concerns about it. Lots there.

  20. Billy Meinke’s avatar

    You might also like this piece I recently wrote about data collected about student data being harvested by textbook publishers:

    https://medium.com/@billymeinke/student-data-harvested-by-education-publishers-they-haz-more-than-u-think-4a952e0853de

    Aloha.

  21. Rastislav Hatala’s avatar

    Solution for all this is decentralisation of web and advertising… by putting it all on blockchain… Check projects like https://ipfs.io/ https://basicattentiontoken.org/ https://akasha.world/ and quite many more, which are already working on and offering a solution.

  22. Doc Searls’s avatar

    Thanks, Ratislav. I’m familiar with all those and have covered them elsewhere, but do need to cover them more. So stay tuned for that.

  23. Jason Montoya’s avatar

    Interesting read, and thanks for sharing. When the news came out about this issue, it came across odd to me because Facebook is simply one of many companies leveraging personal data for their benefit. I appreciate you vocalizing this reality here.

  24. Yuhong Bao’s avatar

    Feel me to email me about the essay.

  25. Dave’s avatar

    Not your fault (it’s Harvard’s), but I find it odd how you reply on other comments and not @matt who noticed the SAME thing I have – your “blog” has SIX trackers attached, including FaceBook and Google.

    Kind of makes me hesitate to put a real email in. (I am, but it’s a trash one I use.)

  26. Roland’s avatar

    1. Once the tech companies have the data, is it your data or their data? I believe it is their data.

    2. It’s a bit of a leap for most people, but the tech companies actually do not track you. They track your equipment (e.g., mobile phone), and how it interacts with their equipment (e.g. page visited, data you entered on their server (name, address, etc.), and how you caused their equipment to operate (e.g., record a Like impression).

  27. Mike Warot’s avatar

    The root cause of this is several layers removed from the focus of discussion, please forgive the necessary need to leave out a lot of details.
    1> Because of “default privilege”, operating systems trust everything, which is insane. This means that Linux, Windows, Mac OS, and pretty much everything else can’t be made secure.
    2> Because computers can’t be made secure, they require active IT support to help try to keep the mess contained, and to fix any breaches that occur. This means that running a server of one’s own isn’t really an option for 99% of people
    3> This forces people to either find “free” services, or to hire them to host content. This biases everything towards walled gardens.
    4> Once the walled gardens go up, they become “safe” as network effects kick in… “safe” means you’re not likely to have your (insecure) computer catch anything from them.
    5> Over time network effects and lock-in result in a lot of money changing hands and the inevitable corruption that results.
    6> Here we are today…. several steps removed from the original issue that MUST be solved first… otherwise it’s all going to be patchwork forever.

  28. Doc Searls’s avatar

    matt, I replied to this earlier, but apparently it didn’t take. I also added a screen shot of the trackers to the post, in hope that it will get some action (or at least some explaining) from the Harvard folk who maintain the blog. We’ll see how it goes. If you look at that screen shot, you can see I blocked most of the trackers too.

  29. Thien’s avatar

    Every search text I made on Google would be the same as Facebook ads recommendation. Has anyone else ever encountered such that case???

  30. Graham Sadd’s avatar

    Doc. I responded to a recent post/video on LinkedIn by Andrew Grill. This is a brief extract:

    “I have been saying for a while now that consumers are poised to take back control of their data. I believe within 5 years what happened with Facebook and CA would not happen again because my data would be in my personal cloud, and Facebook and others would have to broker a digital deal (via AI-powered digital agents) to secure access to my data, for a fee (or prohibit them completely from using it).”

    And my comment:

    I predicted this serious issue many years ago and founded PAOGA on the principle that organisations, private and public, global and local, must realise that their business needs to take seriously the cliche ‘People Are Our Greatest Asset’.
    During that time I have been told that “There is no privacy, get over it!” and “Nothing to hide, nothing to fear!” most recently from Amber Rudd. I am still awaiting her response to my request for her Banking and Medical records.

    1. Social Networks (Facebook etc.) provide a platform. – PAOGA provides a platform
    2. Their Users provide the Content. – Our Users provide the Content.
    3. ‘Messaging’ employs end-to-end encryption. – We employ Person-to-Person encryption.
    4. The Network ‘shares’ their User/Content for Advertising Revenue. – PAOGA protects our User/Content for Subscription Revenue.

    • Total security of data/information in transit and in storage (extending GDPR data protection to the data subject).
    • Every element individually, automatically and uniquely encrypted.
    • Total anonymity and total control.
    • PAOGA cannot access or share our clients Personal Information, data or documents – zero knowledge encryption.

    I remember clearly a comment some 15 years ago from JP Rangaswami on a panel who questioned the relevance of the word ‘Relationship’ in CRM as it was, and is, a one-way relationship. From this provocative comment I realised that, in the developing Social Network community, we needed TRM, Trusted Relationship Management, to enable peer-to-peer connections with those you trust and are relevant in the multiple roles that we all play; student, employee, employer, client, patient, parent etc.
    Your AntiSocial Network – Not instead of Social Networks but alongside.

    It is a long time Doc since we were discussing VRM in Oxford. Our concerns were valid then but it has taken more than a decade for the market to realise the risks and the benefits. See you when you are next over.

  31. Tom Foremski’s avatar

    Google and Facebook’s AI and algorithms will dramatically improve their spy and snitch technologies resulting in further resentment by their users and inevitable future data scandals. The issue with consumer data is not that advertisers want us to buy their products but that we will be judged by others — politically, economically and socially. And missed triangulation of anonymous databases will likely cause huge mistakes in profiles that will last for years.

1 · 2 ·

Comments are now closed.