Research

You are currently browsing the archive for the Research category.

Wikipedia

In Students are told not to use Wikipedia for research. But it’s a trustworthy source, Rachel Cunneen and Mathieu O’Niel nicely unpack their case for the headline. In a online polylogue in response to that piece, I wrote,

“You always have a choice: to help or to hurt.” That’s what my mom told me, a zillion years ago. It applies to everything we do, pretty much.

The purpose of Wikipedia is to help. Almost entirely, it does. It is a work of positive construction without equal or substitute. That some use it to hurt, or to spread false information, does not diminish Wikipedia’s worth as a resource.

The trick for researchers using Wikipedia as a resource is not a difficult one: don’t cite it. Dig down in references, make sure those are good, and move on from there. It’s not complicated.

Since that topic and comment are due to slide down into the Web’s great forgettery (where Google searches do not go), I thought I’d share it here.

If you listen to Episode 49: Parler, Ownership, and Open Source of the latest Reality 2.0 podcast, you’ll learn that I was blindsided at first by the topic of Parler, which has lately become a thing. But I caught up fast, even getting a Parler account not long after the show ended. Because I wanted to see what’s going on.

Though self-described as “the world’s town square,” Parler is actually a centralized social platform built for two purposes: 1) completely free speech; and 2) creating and expanding echo chambers.

The second may not be what Parler’s founders intended (see here), but that’s how social media algorithms work. They group people around engagements, especially likes. (I think, for our purposes here, that algorithmically nudged engagement is a defining feature of social media platforms as we understand them today. That would exclude, for example, Wikipedia or a popular blog or newsletter with lots of commenters. It would include, say, Reddit and Linkedin, because algorithms.)

Let’s start with recognizing that the smallest echo chamber in these virtual places is our own, comprised of the people we follow and who follow us. Then note that our visibility into other virtual spaces is limited by what’s shown to us by algorithmic nudging, such as by Twitter’s trending topics.

The main problem with this is not knowing what’s going on, especially inside other echo chambers. There are also lots of reasons for not finding out. For example, my Parler account sits idle because I don’t want Parler to associate me with any of the people it suggests I follow, soon as I show up:

l also don’t know what to make of this, which is the only other set of clues on the index page:

Especially since clicking on any of them brings up the same or similar top results, which seem to have nothing to do with the trending # topic. Example:

Thus endeth my research.

But serious researchers should be able to see what’s going on inside the systems that produce these echo chambers, especially Facebook’s.

The problem is that Facebook and other social networks are shell games, designed to make sure nobody knows exactly what’s going on, but feels okay with it, because they’re hanging with others who agree on the basics.

The design principle at work here is obscurantism—”the practice of deliberately presenting information in an imprecise, abstruse manner designed to limit further inquiry and understanding.”

To put the matter in relief, consider a nuclear power plant:

(Photo of kraftwerk Grafenrheinfeld, 2013, by Avda. Licensed CC BY-SA 3.0.)

Nothing here is a mystery. Or, if there is one, professional inspectors will be dispatched to solve it. In fact, the whole thing is designed from the start to be understandable, and its workings accountable to a dependent public.

Now look at a Facebook data center:

What it actually does is pure mystery, by design, to those outside the company. (And hell, to most, maybe all, of the people inside the company.) No inspector arriving to look at a rack of blinking lights in that place is going to know either. What Facebook looks like to you, to me, to anybody, is determined by a pile of discoveries, both on and off of Facebook’s site and app, around who you are and what to machines you seem interested in, and an algorithmic process that is not accountable to you, and impossible for anyone, perhaps including Facebook itself, to fully explain.

All societies, and groups within societies, are echo chambers. And, because they cohere in isolated (and isolating) ways it is sometimes hard for societies to understand each other, especially when they already have prejudicial beliefs about each other. Still, without the further influence of social media, researchers can look at and understand what’s going on.

Over in the digital world, which overlaps with the physical one, we at least know that social media amplifies prejudices. But, though it’s obvious by now that this is what’s going on, doing something to reduce or eliminate the production and amplification of prejudices is damn near impossible when the mechanisms behind it are obscure by design.

This is why I think these systems need to be turned inside out, so researchers can study them. I don’t know how to make that happen; but I do know there is nothing more large and consequential in the world that is also absent of academic inquiry. And that ain’t right.

BTW, if Facebook, Twitter, Parler or other social networks actually are opening their algorithmic systems to academic researchers, let me know and I’ll edit this piece accordingly.

We know more than we can tell.

That one-liner from Michael Polanyi has been waiting half a century for a proper controversy, which it now has with facial recognition. Here’s how he explains it in The Tacit Dimension:

This fact seems obvious enough; but it is not easy to say exactly what it means. Take an example. We know a person’s face, and can recognize it among a thousand others, indeed among a million. Yet we usually cannot tell how we recognize a face we know. So most of this knowledge cannot be put into words.

Polanyi calls that kind of knowledge tacit. The kind we can put into words he calls explicit.

For an example of both at work, consider how, generally, we  don’t know how we will end the sentences we begin, or how we began the sentences we are ending—and how the same is true of what we hear or read from other people whose sentences we find meaningful. The explicit survives only as fragments, but the meaning of what was said persists in tacit form.

Likewise, if we are asked to recall and repeat, verbatim, a paragraph of words we have just said or heard, we will find it difficult or impossible to do so, even if we have no trouble saying exactly what was meant. This is because tacit knowing, whether kept to one’s self or told to others, survives the natural human tendency to forget particulars after a few seconds, even when we very clearly understand what we have just said or heard.

Tacit knowledge and short term memory are both features of human knowing and communication, not bugs. Even for people with extreme gifts of memorization (e.g. actors who can learn a whole script in one pass, or mathematicians who can learn pi to 4000 decimals), what matters more than the words or the numbers is their meaning. And that meaning is both more and other than what can be said. It is deeply tacit.

On the other hand—the digital hand—computer knowledge is only explicit, meaning a computer can know only what it can tell. At both knowing and telling, a computer can be far more complete and detailed than a human could ever be. And the more a computer knows, the better it can tell. (To be clear, a computer doesn’t know a damn thing. But it does remember—meaning it retrieves—what’s in its databases, and it does process what it retrieves. At all those activities it is inhumanly capable.)

So, the more a computer learns of explicit facial details, the better it can infer conclusions about that face, including ethnicity, age, emotion, wellness (or lack of it), and much else. Given a base of data about individual faces, and of names associated with those faces, a computer programmed to be adept at facial recognition can also connect faces to names, and say “This is (whomever).”

For all those reasons, computers doing facial recognition are proving useful for countless purposes: unlocking phones, finding missing persons and criminals, aiding investigations, shortening queues at passport portals, reducing fraud (for example at casinos), confirming age (saying somebody is too old or not old enough), finding lost pets (which also have faces). The list is long and getting longer.

Yet many (or perhaps all) of those purposes are at odds with the sense of personal privacy that derives from the tacit ways we know faces, our reliance on short-term memory, and our natural anonymity (literally, namelessness) among strangers. All of those are graces of civilized life in the physical world, and they are threatened by the increasingly widespread use—and uses—of facial recognition by governments, businesses, schools, and each other.

Louis Brandeis and Samuel Warren visited the same problem more than 130 years ago, when they became alarmed at the privacy risks suggested by photography, audio recording, and reporting on both via technologies that were far more primitive than those we have today. As a warning to the future, they wrote a landmark Harvard Law Review paper titled The Right to Privacy, which has served as a pole star of good sense ever since. Here’s an excerpt:

Recent inventions and business methods call attention to the next step which must be taken for the protection of the person, and for securing to the individual what Judge Cooley calls the right “to be let alone” 10 Instantaneous photographs and newspaper enterprise have invaded the sacred precincts of private and domestic life ; and numerous mechanical devices threaten to make good the prediction that “what is whispered in the closet shall be proclaimed from the house-tops.” For years there has been a feeling that the law must afford some remedy for the unauthorized circulation of portraits of private persons ;11 and the evil of invasion of privacy by the newspapers, long keenly felt, has been but recently discussed by an able writer.12 The alleged facts of a somewhat notorious case brought before an inferior tribunal in New York a few months ago, 13 directly involved the consideration of the right of circulating portraits ; and the question whether our law will recognize and protect the right to privacy in this and in other respects must soon come before out courts for consideration.

They also say the “right of the individual to be let alone…is like the right not be assaulted or beaten, the right not be imprisoned, the right not to be maliciously prosecuted, the right not to be defamed.”

To that list today we might also add, “the right not to be reduced to bits” or “the right not to be tracked like an animal—whether anonymously or not.”

But it’s hard to argue for those rights in our digital world, where computers can see, hear, draw and paint exact portraits of everything: every photo we take, every word we write, every spreadsheet we assemble, every database accumulating in our hard drives—plus those of every institution we interact with, and countless ones we don’t (or do without knowing the interaction is there).

Facial recognition by computers is a genie that is not going back in the bottle. And there are no limits to wishes the facial recognition genie can grant the organizations that want to use it, which is why pretty much everything is being done with it. A few examples:

  • Facebook’s Deep Face sells facial recognition for many purposes to corporate customers. Examples from that link: “Face Detection & Landmarks…Facial Analysis & Attributes…Facial Expressions & Emotion… Verification, Similarity & Search.” This is non-trivial stuff. Writes Ben Goertzel, “Facebook has now pretty convincingly solved face recognition, via a simple convolutional neural net, dramatically scaled.”
  • FaceApp can make a face look older, younger, whatever. It can even swap genders.
  • The FBI’s Next Generation Identification (NGI), involves (says Wikipedia) eleven companies and the National Center for State Courts (NCSC).
  • Snap has a patent for reading emotions in faces.
  • The MORIS™ Multi-Biometric Identification System is “a portable handheld device and identification database system that can scan, recognize and identify individuals based on iris, facial and fingerprint recognition,” and is typically used by law enforcement organizations.
  • Casinos in Canada are using facial recognition to “help addicts bar themselves from gaming facilities.” It’s opt-in: “The technology relies on a method of “self-exclusion,” whereby compulsive gamblers volunteer in advance to have their photos banked in the system’s database, in case they ever get the urge to try their luck at a casino again. If that person returns in the future and the facial-recognition software detects them, security will be dispatched to ask the gambler to leave.”
  • Cruise ships are boarding passengers faster using facial recognition by computers.
  • Australia proposes scanning faces to see if viewers are old enough to look at porn.

Facial recognition systems are also getting better and better at what they do. A November 2018 NIST report on a massive study of facial recognition systems begins,

This report documents performance of face recognition algorithms submitted for evaluation on image datasets maintained at NIST. The algorithms implement one-to-many identification of faces appearing in two-dimensional images.

The primary dataset is comprised of 26.6 million reasonably well-controlled live portrait photos of 12.3 million individuals. Three smaller datasets containing more unconstrained photos are also used: 3.2 million webcam images; 2.5 million photojournalism and amateur photographer photos; and 90 thousand faces cropped from surveillance-style video clips. The report will be useful for comparison of face recognition algorithms, and assessment of absolute capability. The report details recognition accuracy for 127 algorithms from 45 developers, associating performance with participant names. The algorithms are prototypes, submitted in February and June 2018 by research and development laboratories of commercial face recognition suppliers and one university…

The major result of the evaluation is that massive gains in accuracy have been achieved in the last five years (2013-2018) and these far exceed improvements made in the prior period (2010-2013). While the industry gains are broad — at least 28 developers’ algorithms now outperform the most accurate algorithm from late 2013 — there remains a wide range of capabilities. With good quality portrait photos, the most accurate algorithms will find matching entries, when present, in galleries containing 12 million individuals, with error rates below 0.2%

Privacy freaks (me included) would like everyone to be creeped out by this. Yet many people are cool with it to some degree, and not just because they’re acquiescing to the inevitable: they’re relying on it because it makes interaction with machines easier—and they trust it.

For example, in Barcelona, CaixaBank is rolling out facial recognition at its ATMs, claiming that 70% of surveyed customers are ready to use it as an alternative to keying in a PIN, and that “66% of respondents highlighted the sense of security that comes with facial recognition.” That the bank’s facial recognition system “has the capability of capturing up to 16,000 definable points when the user’s face is presented at the screen” is presumably of little or no concern. Nor, also presumably, is the risk of what might get done with facial data if the bank gets hacked, or if it changes its privacy policy, or if it gets sold and the new owner can’t resist selling or sharing facial data with others who want it, or if (though more like when) government bodies require it.

A predictable pattern for every new technology is that what can be done will be done—until we see how it goes wrong and try to stop doing that. This has been true of every technology from stone tools to nuclear power and beyond. Unlike many other new technologies, however, it is not hard to imagine ways facial recognition by computers can go wrong, especially when it already has.

Two examples:

  1. In June, U.S. Customs and Border Protection, which relies on facial recognition and other biometrics, revealed that photos of people were compromised by a cyberattack on a federal subcontractor.
  2. In August, researchers at vpnMentor reported a massive data leak in BioStar 2, a widely used “Web-based biometric security smart lock platform” that uses facial recognition and fingerprinting technology to identify users, which was compromised. Notes the report, “Once stolen, fingerprint and facial recognition information cannot be retrieved. An individual will potentially be affected for the rest of their lives.” vpnMentor also had a hard time getting through to company officials, so they could fix the leak.

As organizations should know (but in many cases have trouble learning), the highest risks of data exposure and damage are to—

  • the largest data sets,
  • the most complex organizations and relationships, and
  • the largest variety of existing and imaginable ways that security can be breached

And let’s not discount the scary potentials at the (not very) far ends of technological progress and bad intent. Killer microdrones targeted at faces, anyone?

So it is not surprising that some large companies doing facial recognition go out of their way to keep personal data out of their systems. For example, by making facial recognition work for the company’s customers, but not for the company itself.

Such is the case with Apple’s late model iPhones, which feature FaceID: a personal facial recognition system that lets a person unlock their phone with a glance. Says Apple, “Face ID data doesn’t leave your device and is never backed up to iCloud or anywhere else.”

But assurances such as Apple’s haven’t stopped push-back against all facial recognition. Some examples—

  • The Public Voice: “We the undersigned call for a moratorium on the use of facial recognition technology that enables mass surveillance.”
  • Fight for the Future: BanFacialRecognition. Self-explanatory, and with lots of organizational signatories.
  • New York Times: “San Francisco, long at the heart of the technology revolution, took a stand against potential abuse on Tuesday by banning the use of facial recognition software by the police and other agencies. The action, which came in an 8-to-1 vote by the Board of Supervisors, makes San Francisco the first major American city to block a tool that many police forces are turning to in the search for both small-time criminal suspects and perpetrators of mass carnage.”
  • Also in the Times, Evan Sellinger and Woodrow Hartzhog write, “Stopping this technology from being procured — and its attendant databases from being created — is necessary for protecting civil rights and privacy. But limiting government procurement won’t be enough. We must ban facial recognition in both public and private sectors before we grow so dependent on it that we accept its inevitable harms as necessary for “progress.” Perhaps over time, appropriate policies can be enacted that justify lifting a ban. But we doubt it.”
  • Cory Doctorow‘s Why we should ban facial recognition technology everywhere is an “amen” to the Selinger & Hartzhog piece.
  • BanFacialRecognition.com lists 37 participating organizations, including EPIC (Electronic Privacy Information Center), Daily Kos, Fight for the Future, MoveOn.org, National Lawyers Guild, Greenpeace and Tor.
  • MIT Technology Revew says bans are spreading in the U.S.: San Francisco and Oakland, California, and Somerville, Massachusetts, have outlawed certain uses of facial recognition technology, with Portland, Oregon, potentially soon to follow. That’s just the beginning, according to Mutale Nkonde, a Harvard fellow and AI policy advisor. That trend will soon spread to states, and there will eventually be a federal ban on some uses of the technology, she said at MIT Technology Review’s EmTech conference.”

Irony alert: the black banner atop that last story says, “We use cookies to offer you a better browsing experience, analyze site traffic, personalize content, and serve targeted advertisements.” Notes the TimesCharlie Warzel, “Devoted readers of the Privacy Project will remember mobile advertising IDs as an easy way to de-anonymize extremely personal information, such as location data.” Well, advertising IDs are among the many trackers that both MIT Technology Review and The New York Times inject in readers’ browsers with every visit. (Bonus link.)

My own position on all this is provisional because I’m still learning and there’s a lot to take in. But here goes:

The only entities that should be able to recognize people’s faces are other people. And maybe their pets. But not machines.

But, since the facial recognition genie will never go back in its bottle, I’ll suggest a few rules for entities using computers to do facial recognition. All these are provisional as well:

  1. People should have their own forms of facial recognition, for example, to unlock phones, sort through old photos, or to show to others the way they would a driving license or a passport (to say, in effect, “See? This is me.”) But, the data they gather for themselves should not be shared with the company providing the facial recognition software (unless it’s just of their own face, and then only for the safest possible diagnostic or service improvement purposes). This, as I understand it, is roughly what Apple does with iPhones.
  2. Facial recognition used to detect changing facial characteristics (such as emotions, age, or wellness) should be required to forget what they see, right after the job is done, and not use the data gathered for any purpose other than diagnostics or performance improvement.
  3. For persons having their faces recognized, sharing data for diagnostic or performance improvement purposes should be opt-in, with data anonymized and made as auditable as possible, by individuals and/or their intermediaries.
  4. For enterprises with systems that know individuals’ (customers’ or consumers’) faces, don’t use those faces to track or find those individuals elsewhere in the online or offline worlds—again, unless those individuals have opted into the practice.

I suspect that Polanyi would agree with those.

But my heart is with Walt Whitman, whose Song of Myself argued against the dehumanizing nature of mechanization at the dawn of the industrial age. Wrote Walt,

Encompass worlds but never try to encompass me.
I crowd your noisiest talk by looking toward you.

Writing and talk do not prove me.I carry the plenum of proof and everything else in my face.
With the hush of my lips I confound the topmost skeptic…

Do I contradict myself?
Very well then. I contradict myself.
I am large. I contain multitudes.

The spotted hawk swoops by and accuses me.
He complains of my gab and my loitering.

I too am not a bit tamed. I too am untranslatable.
I sound my barbaric yawp over the roofs of the world.

The barbaric yawps by human hawks say five words, very explicitly:

Get out of my face.

And they yawp those words in spite of the sad fact that obeying them may prove impossible.

[Later bonus links…]

 

This is wrong:

Because I’m not blocking ads. I’m blocking tracking.

In fact I welcome ads—especially ones that sponsor The Washington Post and other fine publishers. I’ll also be glad to subscribe to the Post once it stops trying to track me off their site. Same goes for The New York Times, The Wall Street Journal and other papers I value and to which I no longer subscribe.

Right now Privacy Badger protects me from 20 and 35 potential trackers at those papers’ sites, in addition to the 19 it finds at the Post. Most of those trackers are for stalking readers like marked animals, so their eyeballs can be shot by “relevant,” “interest-based” and “interactive” ads they would never request if they had much choice about it—and in fact have already voted against with ad blocking, which by 2015 was already the biggest boycott in world history. As I point out in that link (and Don Marti did earlier in DCN), there was in that time frame a high correlation between interest in blocking ads and interest (surely by the ad industry) in retargeting, which is the most obvious evidence to people that they are being tracked. See here:

Tracking-based ads, generally called adtech, do not sponsor publications. They use publications as holding pens in which human cattle can be injected with uninvited and unwelcome tracking files (generally called cookies) so their tracked eyeballs can be shot, wherever they might show up, with ads aimed by whatever surveillance data has been gleaned from those eyeballs’ travels about the Net.

Real advertising—the kind that makes brands and sponsors publications—doesn’t track people. Instead it is addressed to whole populations. In doing so it sponsors the media it uses, and testifies to those media’s native worth. Tracking-based ads can’t and don’t do that.

That tracking-based ads pay, and are normative in the extreme, does not make right the Post‘s participation in the practice. Nor does it make correct the bad thinking (and reporting!) behind notices such as the one above.

Let’s also be clear about two myths spread by the “interactive” (aka “relevant” and “interest-based”) advertising business:

  1. That the best online advertising is also the most targeted—and “behavioral” as well, meaning informed by knowledge about an individual, typically gathered by tracking. This is not the kind of advertising that made Madison Avenue, that created nearly every brand you can name, and that has sponsored publishers and other media for the duration. Instead it is direct marketing, aka direct response marketing. Both of those labels are euphemistic re-brandings that the direct mail business gave itself after the world started calling it junk mail. Sure, much (or most) of the paid messages we see online are called advertising, and look like advertising; but as long as they want to get personal, they’re direct marketing.
  2. That tracking-based advertising (direct marketing by another name) is the business model of the “free” Internet. In fact the Internet at its base is as free as gravity and sunlight, and floats all business boats, whether based on advertising or not.

Getting the world to mistake direct marketing for real advertising is one of the great magic tricks of all time: a world record for misdirection in business. To help explain the difference, I wrote Separating Advertising’s Wheat From Chaff, the most quoted line from which is “Madison Avenue fell asleep, direct response marketing ate its brain, and it woke up as an alien replica of itself.” Alas, the same is true for the business offices of the Post and every other publisher that depends on tracking. They ceased selling their pages as spaces for sponsors and turned those spaces over to data vampires living off the blood of readers’ personal data.

There is a side for those publishers to take on this thing, and it’s not with the tracking-based advertising business. It is with their own moral backbone, and with the readers who still keep faith in it.

If any reporter (e.g.@CraigTimberg @izzadwoskin@nakashimae ‏and @TonyRomm) wants to talk to me about this, write me at doc at searls.com or DM me here on Twitter.* Thanks.

Bonus link (and metaphor)

*So far, silence. But hey: I know I’m asking journalists to grab a third rail here. And it’s one that needs to be grabbed. There might even be a Pulitzer for whoever grabs it. Because the story is that big, and it’s not being told, at least not by any of the big pubs. The New York TimesPrivacy Project has lots of great stuff, but none that grabs the third rail. The closest the Times has come is You’re not alone when you’re on Google, by Jennifer Senior (@JenSeniorNY). In it she says “your newspaper” (alas, not this one) is among the culprits. But it’s a step. We need more of those. (How about it, @cwarzel?)†

[Later…] We actually have a great model for how the third rail might be grabbed, because The Wall Street Journal wrestled it mightily with the What They Know series, which ran from 2010 to 2012. For most of the years after that, the whole series, which was led by Julia Angwin and based on lots of great research, was available on the Web for everybody at http://wsj.com/wtk. But that’s a 404 now. If you want to see a directory of the earliest pieces, I list them in a July 2010 blog post titled The Data Bubble. That post begins,

The tide turned today. Mark it: 31 July 2010.

That’s when The Wall Street Journal published The Web’s Gold Mine: Your Secrets, subtitled A Journal investigation finds that one of the fastest-growing businesses on the Internet is the business of spying on consumers. First in a series. It has ten links to other sections of today’s report.

Alas, the tide did not turn. It kept coming in and getting deeper. And now we’re drowning under it.

† I did hear from Charlie Warzel (@cwarzel), who runs the Privacy Project series at the Times , and assured me that they would be covering the issue. And (Yay!) it did, with I Visited 47 Sites. Hundreds of Trackers Followed Me, by Farhad Manjoo (@fmanjoo). This was followed by critique of that piece titled Privacy Fundamentalism, by Ben Thompson in Stratechery. I responded to both with On Privacy Fundamentalism. So check those out too.

The answer is, we don’t know. Also, we may never know, because—

  • It’s too hard to measure (especially if you’re talking about the entire Net).
  • Too so much of the usage is in mobile devices of too many different kinds.
  • The browser makers are approaching ad blocking and tracking protection in different and new ways that change frequently, and the same goes for ad-blocking and tracking-protecting extensions and add-ons. One of them (Adblock Plus) is actually in the advertising business (which Wikipedia politely calls ad filtering) in the sense that they sell safe package for paying advertisers.
  • Some of the most easily sourced measures are surveys, yet what people say and what they do can be very different things.
  • Some of the most widely cited findings are from sources with conflicted interests (for example, selling anti-ad-blocking services), or which aggregate multiple sources that aren’t revealed when cited.
  • Actors good and bad in the ecosystem that ad blocking addresses also contribute to the fudge.

But let’s explore a bit anyway, working with what we’ve got, flawed though much of it may be. If you’re a tl;dr kind of reader, jump down to the conclusions at the end.

Part 1: ClarityRay and Pagefair

Between 2012 and 2017, the most widely cited ad blocking reports were by ClarityRay and PageFair, in that order. There are no links to ClarityRay’s 2012 report, which I cited here in 2013. PageFair links to their 2015, 2016 (mobile) and 2017 reports are still live. The company also said last November that it was at work on another report. This was after PageFair was acquired by Blockthrough (“the leading adblock recovery program”). A PageFair blog post explains it.

I placed a lot of trust in PageFair’s work, mostly because I respected Dr. Johnny Ryan (@JohnnyRyan), who left PageFair for Brave in 2018. I also like what I know about Matthew Cortland, who was also at PageFair, and may still be. Far as I know, he hasn’t written anything about ad blocking research (but maybe I’ve missed it) since 2017.

Here are the main findings from PageFair’s 2017 report:

  • 615 million devices now use adblock
  • 11% of the global internet population is blocking ads on the web

Part 2: GlobalWebIndex

In January 2016, GlobalWebIndex said “37% of mobile users … say they’ve blocked ads on their mobile within the last month.” I put that together with Statista’s 2017 claim that there were then more than 4.6 billion mobile phone users in the world, which suggested that 1.7 billion people were blocking ads by that time.

Now GlobalWebIndex‘s Global Ad-Blocking Behavior report says 47% of us are blocking ads now. It also says, “As a younger and more engaged audience, ad-blockers also are much more likely to be paying subscribers and consumers. Ad-free premium services are especially attractive.” Which is pretty close to Don Marti‘s long-standing claim that readers who protect their privacy are more valuable than readers who don’t.

To get a total ad blocking population from that 47%, one possible source to cite is Internet World Stats:

Note that Internet World Stats appears to be a product of the Miniwatts Marketing Group, whose website is currently a blank WordPress placeholder. But, to be modest about it, their number is lower than Statista’s from 2016: “In 2019 the number of mobile phone users is forecast to reach 4.68 billion.” So let’s run with the lower one, at least for now.

Okay, so if 47% of us are using ad blockers, and Internet World Stats says there were 4,312,982,270 Internet users by the end of last year (that’s mighty precise!), the combined numbers suggest that more than 2,027,101,667 people are now blocking ads worldwide. So, we might generalize, more than two billion people are blocking ads today. Hence the headline above.

Perspective: back in 2015, we were already calling ad blocking The biggest boycott in human history. And that was when the number was just “approaching 200 million.”

More interesting to me is GlobalWebIndex’s breakouts of listed reasons why the people surveyed blocked ads. Three in particular stand out:

  • Ads contain viruses or bugs, 38%
  • Ads might compromise my online privacy, 26%
  • Stop ads being personalized, 22%

The problem here, as I said in the list up top, is that these are measured behaviors. They are sympathies. But they’re still significant, because sympathies sell. That means there are markets here. Opportunities to align incentives.

Part 3: Ad Fraud Researcher

I rely a great deal on Dr. Augustine Fou (@acfou), aka Independent Ad Fraud Researcher, to think and work more deeply and knowingly than I’ve done so far here (or may ever do).

Looking at Part 2 above (in an earlier version of this post), he tweeted, “I dispute these findings. ASKING people if they used an ad blocker in the past month is COMPLETELY inaccurate and inconsistent with people who ACTUALLY USE ad blockers regularly.” Also, “Source: GlobalWebIndex Q3 2018 Base: 93,803 internet users aged 16-64, among which were 42,078 respondents who have used an ad-blocker in the past month”. Then, “Are you going to take numbers extrapolated from 42,078 respondents and extrapolate that to the entire world? that would NOT be OK.” And, “Desktop ad blocking in the U.S. measured directly on sites which humans visit is in the 8 – 19% range. Bots must also be scrubbed because bots do not block ads and will skew ad blocking rates lower, if not removed.”

On that last tweet he points to his own research, published this month.There is lots of data in there, all of it interesting and unbiased. Then he adds, “your point about this being the ‘biggest boycott in human history’ is still valid. But the numbers from that ad blocking study should not be used.”

Part 4: Comscore

Among the many helpful tweets in response to the first draft of this post was this one by Zubair Shafiq (@zubair_shafiq), Assistant Professor of Computer Science at the University of Iowa, where he researches computer networks, security, and privacy. His tweet points to Ad Blockers: Global Prevalence and Impact, by Matthew Malloy, Mark McNamara, Aaron Cahn and Paul Barford, from 2016. Here is one chart among many in the report:

The jive in the Geo row is explained at that link. A degree in statistics will help.

Part 5: Statista

Statista seems serious, but Ad blocking user penetration rate in the United States from 2014 to 2020 is behind a paywall. Still, they do expose this hunk of text: “The statistic presents data on ad blocking user penetration rate in the United States from 2014 to 2020. It was found that 25.2 percent of U.S. internet users blocked ads on their connected devices in 2018. This figure is projected to grow to 27.5 percent in 2020.”

Provisional Conclusions

  1. The number is huge, but we don’t know how huge.
  2. Express doubt about any one large conclusion. Augustine Fou cautions me (and all of us) to look at where the data comes from, why it’s used, and how. In the case of Statista, for example, the data is aggregated from other sources. They don’t do the research themselves. It’s also almost too easy to copy and paste (as I’ve done here) images that might themselves be misleading. The landmark book on misleading statistics—no less relevant today than when it was written in 1954 (and perhaps more relevant than ever)—is How to Lie With Statistics.
  3. Everything is changing. For example, browsers are starting to obsolesce the roles played by ad blocking and tracking protection extensions and add-ons. Brave is the early leader, IMHO. Safari, Firefox and even Chrome are all making moves in this direction. Also check out Ghostery’s Cliqz. For some perspective on how long this is taking, take a look at what I was calling for way back in 2015.
  4. Still, the market is sending a massive message. And that’s what fully matters. The message is this: advertising online has come to have massively negative value.

Ad blocking and tracking protection are legitimate and eloquent messages from demand to supply. By fighting that message, marketing is crapping on most obvious and gigantic clue it has ever seen. And the supply side of the market isn’t just marketers selling stuff. It’s developers who need to start working for the hundreds of millions of customers who have proven their value by using these tools.

Enforcing Data Protection: A Model for Risk-Based Supervision Using Responsive Regulatory Tools, a post by Dvara Research, summarizes Effective Enforcement of a Data Protection Regime, a deeply thought and researched paper by Beni Chugh (@BeniChugh), Malavika Raghavan (@teninthemorning), Nishanth Kumar (@beamboybeamboy) and Sansiddha Pani (@julupani). While it addresses proximal concerns in India, it provides useful guidance for data regulators everywhere.

An excerpt:

Any data protection regulator faces certain unique challenges. The ubiquitous collection and use of personal data by service providers in the modern economy creates a vast space for a regulator to oversee. Contraventions of a data protection regime may not immediately manifest and when they do, may not have a clear monetary or quantifiable harm. The enforcement perimeter is market-wide, so a future data protection authority will necessarily interface with other sectoral institutions.  In light of these challenges, we present a model for enforcement of a data protection regime based on risk-based supervision and the use of a range of responsive enforcement tools.

This forward-looking approach considers the potential for regulators to employ a range of softer tools before a breach to prevent it and after a breach to mitigate the effects. Depending on the seriousness of contraventions, the regulator can escalate up to harder enforcement actions. The departure from the focus on post-data breach sanctions (that currently dominate data protection regimes worldwide) is an attempt to consider how the regulatory community might act in coordination with entities processing data to minimise contraventions of the regime.

I hope European regulators are looking at this. Because, as I said in a headline to a post last month, without enforcement, the GDPR is a fail.

Bonus link from the IAPP (International Association of Privacy Professionals): When will we start seeing GDPR enforcement actions? We guess Feb. 22, 2019.

In The Big Short, investor Michael Burry says “One hallmark of mania is the rapid rise in the incidence and complexity of fraud.” (Burry shorted the mania- and fraud-filled subprime mortgage market and made a mint in the process.)

One would be equally smart to bet against the mania for the tracking-based form of advertising called adtech.

Since tracking people took off in the late ’00s, adtech has grown to become a four-dimensional shell game played by hundreds (or, if you include martech, thousands) of companies, none of which can see the whole mess, or can control the fraud, malware and other forms of bad acting that thrive in the midst of it.

And that’s on top of the main problem: tracking people without their knowledge, approval or a court order is just flat-out wrong. The fact that it can be done is no excuse. Nor is the monstrous sum of money made by it.

Without adtech, the EU’s GDPR (General Data Protection Regulation) would never have happened. But the GDPR did happen, and as a result websites all over the world are suddenly posting notices about their changed privacy policies, use of cookies, and opt-in choices for “relevant” or “interest-based” (translation: tracking-based) advertising. Email lists are doing the same kinds of things.

“Sunrise day” for the GDPR is 25 May. That’s when the EU can start smacking fines on violators.

Simply put, your site or service is a violator if it extracts or processes personal data without personal permission. Real permission, that is. You know, where you specifically say “Hell yeah, I wanna be tracked everywhere.”

Of course what I just said greatly simplifies what the GDPR actually utters, in bureaucratic legalese. The GDPR is also full of loopholes only snakes can thread; but the spirit of the law is clear, and the snakes will be easy to shame, even if they don’t get fined. (And legitimate interest—an actual loophole in the GDPR, may prove hard to claim.)

Toward the aftermath, the main question is What will be left of advertising—and what it supports—after the adtech bubble pops?

Answers require knowing the differences between advertising and adtech, which I liken to wheat and chaff.

First, advertising:

    1. Advertising isn’t personal, and doesn’t have to be. In fact, knowing it’s not personal is an advantage for advertisers. Consumers don’t wonder what the hell an ad is doing where it is, who put it there, or why.
    2. Advertising makes brands. Nearly all the brands you know were burned into your brain by advertising. In fact the term branding was borrowed by advertising from the cattle business. (Specifically by Procter and Gamble in the early 1930s.)
    3. Advertising carries an economic signal. Meaning that it shows a company can afford to advertise. Tracking-based advertising can’t do that. (For more on this, read Don Marti, starting here.)
    4. Advertising sponsors media, and those paid by media. All the big pro sports salaries are paid by advertising that sponsors game broadcasts. For lack of sponsorship, media—especially publishers—are hurting. @WaltMossberg learned why on a conference stage when an ad agency guy said the agency’s ads wouldn’t sponsor Walt’s new publication, recode. Walt: “I asked him if that meant he’d be placing ads on our fledgling site. He said yes, he’d do that for a little while. And then, after the cookies he placed on Recode helped him to track our desirable audience around the web, his agency would begin removing the ads and placing them on cheaper sites our readers also happened to visit. In other words, our quality journalism was, to him, nothing more than a lead generator for target-rich readers, and would ultimately benefit sites that might care less about quality.” With friends like that, who needs enemies?

Second, Adtech:

    1. Adtech is built to undermine the brand value of all the media it uses, because it cares about eyeballs more than media, and it causes negative associations with brands. Consider this: perhaps a $trillion or more has been spent on adtech, and not one brand known to the world has been made by it. (Bob Hoffman, aka the Ad Contrarian, is required reading on this.)
    2. Adtech wants to be personal. That’s why it’s tracking-based. Though its enthusiasts call it “interest-based,” “relevant” and other harmless-sounding euphemisms, it relies on tracking people. In fact it can’t exist without tracking people. (Note: while all adtech is programmatic, not all programmatic advertising is adtech. In other words, programmatic advertising doesn’t have to be based on tracking people. Same goes for interactive. Programmatic and interactive advertising will both survive the adtech crash.)
    3. Adtech spies on people and violates their privacy. By design. Never mind that you and your browser or app are anonymized. The ads are still for your eyeballs, and correlations can be made.
    4. Adtech is full of fraud and a vector for malware. @ACFou is required reading on this.
    5. Adtech incentivizes publications to prioritize “content generation” over journalism. More here and here.
    6. Intermediators take most of what’s spent on adtech. Bob Hoffman does a great job showing how as little as 3¢ of a dollar spent on adtech actually makes an “impression. The most generous number I’ve seen is 12¢. (When I was in the ad agency business, back in the last millennium, clients complained about our 15% take. Media our clients bought got 85%.)
    7. Adtech gives fake news a business model, because fake news is easier to produce than the real kind, and adtech will pay anybody a bounty for hauling in eyeballs.
    8. Adtech incentivizes hate speech and tribalism by giving both—and the platforms that host them—a business model too.
    9. Adtech relies on misdirection. See, adtech looks like advertising, and is called advertising; but it’s really direct marketing, which is descended from junk mail and a cousin of spam. Because of that misdirection, brands think they’re placing ads in media, while the systems they hire are actually chasing eyeballs to anywhere. (Pro tip: if somebody says every ad needs to “perform,” or that the purpose of advertising is “to get the right message to the right person at the right time,” they’re actually talking about direct marketing, not advertising. For more on this, read Rethinking John Wanamaker.)
    10. Compared to advertising, adtech is ugly. Look up best ads of all time. One of the top results is for the American Advertising Awards. The latest winners they’ve posted are the Best in Show for 2016. Tops there is an Allstate “Interactive/Online” ad pranking a couple at a ball game. Over-exposure of their lives online leads that well-branded “Mayhem” guy to invade and trash their house. In other words, it’s a brand ad about online surveillance.
    11. Adtech has caused the largest boycott in human history. By more than a year ago, 1.7+ billion human beings were already blocking ads online.

To get a sense of what will be left of adtech after GDPR Sunrise Day, start by reading a pair of articles in AdExchanger by @JamesHercher. The first reports on the Transparency and Consent Framework published by IAB Europe. The second reports on how Google is pretty much ignoring that framework and going direct with their own way of obtaining consent to tracking:

Google’s and other consent-gathering solutions are basically a series of pop-up notifications that provide a mechanism for publishers to provide clear disclosure and consent in accordance with data regulations.

Specifically,

The Google consent interface greets site visitors with a request to use data to tailor advertising, with equally prominent “no” and “yes” buttons. If a reader declines to be tracked, he or she sees a notice saying the ads will be less relevant and asking to “agree” or go back to the previous page. According to a source, one research study on this type of opt-out mechanism led to opt-out rates of more than 70%.

Meaning only 30% of site visitors will consent to being tracked. So, say goodbye to 70% of adtech’s eyeball targets right there.

Google’s consent gathering system, dubbed “Funding Choices,” also screws most of the hundreds of other adtech intermediaries fighting for a hunk of what’s left of their market. Writes James, “It restricts the number of supply chain partners a publisher can share consent with to just 12 vendors, sources with knowledge of the product tell AdExchanger.”

And that’s not all:

Last week, Google alerted advertisers it would sharply limit use of the DoubleClick advertising ID, which brands and agencies used to pull log files from DoubleClick so campaigns could be cohesively measured across other ad servers, incentivizing buyers to consolidate spend on the Google stack.

Google also raised eyebrows last month with a new policy insisting that all DFP publishers grant it status as a data controller, giving Google the right to collect and use site data, whereas other online tech companies – mere data processors – can only receive limited data assigned to them by the publisher, i.e., the data controller.

This is also Google’s way of scraping off GDPR liability on publishers.

Publishers and adtech intermediaries can attempt to avoid Google by using Consent Management Platforms (CMPs), a new category of intermediary defined and described by IAB Europe’s Consent Management Framework. Writes James,

The IAB Europe and and IAB Tech Lab framework includes a list of registered vendors that publishers can pass consent to for data-driven advertising. The tech companies pay a one-time fee between $1,000 and $2,000 to join the vendor list, according to executives from three participating companies…Although now that the framework is live, the barriers to adoption are painfully real as well.

The CMP category is pretty bare at the moment, and it may be greeted with suspicion by some publishers.There are eight initial CMPs: two publisher tech companies with roots in ad-blocker solutions, Sourcepoint and Admiral, as well as the ad tech companies Quantcast and Conversant and a few blockchain-based advertising startups…

Digital Content Next, a trade group representing online news publishers, is advising publishers to reject the framework, which CEO Jason Kint said “doesn’t meet the letter or spirit of GDPR.” Only two publishers have publicly adopted the Consent and Transparency Framework, but they’re heavy hitters with blue-chip value in the market: Axel Springer, Europe’s largest digital media company, and the 180-year-old Schibsted Media, a respected newspaper publisher in Sweden and Norway.

In other words, good luck with that.

[Later, 26 May…] Well, Google caved on this one, so apparently Google is coming to IAB Europe’s table.

[And on 30 May…] Axel Springer is also going its own way.

One big upside for IAB Europe is that its Framework contains open source code and an SDK. For a full unpacking of what’s there see the Consent String and Vendor List Format: Transparency & Consent Framework on GitHub and IAB Europe’s own FAQ. More about this shortly.

Meanwhile, the adtech business surely knows the sky is falling. The main question is how far.

One possibility is 95% of the way to zero. That outcome is suggested by results published in PageFair last October by Dr. Johnny Ryan (@JohnnyRyan) there. Here’s the most revealing graphic in the bunch:

Note that this wasn’t a survey of the general population. It was a survey of ad industry people: “300+ publishers, adtech, brands, and various others…” Pause for a moment and look at that chart again. Nearly all those proffesionals in the business would not accept what their businesses do to other human beings.

“However,” Johnny adds, “almost a third believe that users will consent if forced to do so by ‘tracking walls’, that deny access to a website unless a visitor agrees to be tracked. Tracking walls, however, are prohibited under Article 7 of the GDPR…”

Pretty cynical, no?

The good news for both advertising and publishing is that neither needs adtech. What’s more, people can signal what they want out of the sites they visit—and from the whole marketplace. In fact the Internet itself was designed for exactly that. The GDPR just made the market a lot more willing to start hearing clues from customers that have been laying in plain sight for almost twenty years.

The first clues that fully matter are the ones we—the individuals they’ve been calling “users,” will deliver. Look for details on that in another post.

Meanwhile::::

Pro tip #1: don’t bet against Google, except maybe in the short term, when sunrise will darken the whole adtech business.

Instead, bet against companies that stake their lives on tracking people, and doing that without the clear and explicit consent of the tracked. That’s most of the adtech “ecosystem” not called Google or Facebook.

Google can say it already has consent, and that it is also has a legitimate interest (one of the six “lawful bases” for tracking) in the personal data it harvests from us.

Google can also live without the tracking. Most of its income comes from AdWords—its search advertising business—which is far more guided by what visitors are searching for than by whatever Google knows about those visitors.

Google is also also relatively trusted, as tech companies go. Its parent, Alphabet, is also increasingly diversified. Facebook, on the other hand, does stake its life on tracking people. (I say more about Facebook’s odds here.)

Pro tip #2: do bet on any business working for customers rather than sellers. Because signals of personal intent will produce many more positive outcomes in the digital marketplace than surveillance-fed guesswork by sellers ever could, even with the most advanced AI behind it.

For more on how that will work, read The Intention Economy: When Customers Take Charge. Six years after Harvard Business Review Press published that book, what it says will start to come true. Thank you, GDPR.

Pro tip #3: do bet on developers building tools that give each of us scale in dealing with the world’s companies and governments, because those are the tools businesses working for customers will rely on to scale up their successes as well.

What it comes down to is the need for better signaling between customers and companies than can ever be possible in today’s doomed tracking-fed guesswork system. (All the AI and ML in the world won’t be worth much if the whole point of it is to sell us shit.)

Think about what customers and companies want and need about each other: interests, intentions, competencies, locations, availabilities, reputations—and boundaries.

When customers can operate both privately and independently, we’ll get far better markets than today’s ethically bankrupt advertising and marketing system could ever give us.

Pro tip #4: do bet on publishers getting back to what worked since forever offline and hardly got a chance online: plain old brand advertising that carries both an economic and a creative signal, and actually sponsors the publication rather than using the publication as a way to gather eyeballs that can be advertised at anywhere. The oeuvres of Don Marti (@dmarti) and Bob Hoffman (the @AdContrarian) are thick with good advice about this. I’ve also written about it extensively in the list compiled at People vs. Adtech. Some samples, going back through time:

  1. An easy fix for a broken advertising system (12 October 2017 in Medium and in my blog)
  2. Without aligning incentives, we can’t kill fake news or save journalism (15 September 2017 in Medium)
  3. Let’s get some things straight about publishing and advertising (9 September 2017 and the same day in Medium)
  4. Good news for publishers and advertisers fearing the GDPR (3 September 2017 in ProjectVRM and 7 October in Medium).
  5. Markets are about more than marketing (2 September 2017 in Medium).
  6. Publishers’ and advertisers’ rights end at a browser’s front door (17 June 2017 in Medium). It updates one of the 2015 blog posts below.
  7. How to plug the publishing revenue drain (9 June 2017 in Medium). It expands on the opening (#publishing) section of my Daily Tab for that date.
  8. How True Advertising Can Save Journalism From Drowning in a Sea of Content (22 January 2017 in Medium and 26 January 2017 in my blog.)It’s People vs. Advertising, not Publishers vs. Adblockers (26 August 2016 in ProjectVRM and 27 August 2016 in Medium)
  9. Why #NoStalking is a good deal for publishers (11 May 2016, and in Medium)
  10. How customers can debug business with one line of code (19 April 2016 in ProjectVRM and in Medium)
  11. An invitation to settle matters with @Forbes, @Wired and other publishers (15 April 2016 and in Medium)
  12. TV Viewers to Madison Avenue: Please quit driving drunk on digital (14 Aprl 2016, and in Medium)
  13. The End of Internet Advertising as We’ve Known It(11 December 2015 in MIT Technology Review)
  14. Ad Blockers and the Next Chapter of the Internet (5 November in Harvard Business Review)
  15. How #adblocking matures from #NoAds to #SafeAds (22 October 2015)
  16. Helping publishers and advertisers move past the ad blockade (11 October 2015 on the ProjectVRM blog)
  17. Beyond ad blocking — the biggest boycott in human history (28 Septemper 2015)
  18. A way to peace in the adblock war (21 September 2015, on the ProjectVRM blog)
  19. How adtech, not ad blocking, breaks the social contract (23 September 2015)
  20. If marketing listened to markets, they’d hear what ad blocking is telling them (8 September 2015)
  21. Apple’s content blocking is chemo for the cancer of adtech (26 August 2015)
  22. Separating advertising’s wheat and chaff (12 August 2015, and on 2 July 2016 in an updated version in Medium)
  23. Thoughts on tracking based advertising (18 February 2015)
  24. On marketing’s terminal addiction to data fracking and bad guesswork (10 January 2015)
  25. Why to avoid advertising as a business model (25 June 2014, re-running Open Letter to Meg Whitman, which ran on 15 October 2000 in my old blog)
  26. What the ad biz needs is to exorcize direct marketing (6 October 2013)
  27. Bringing manners to marketing (12 January 2013 in Customer Commons)
  28. What could/should advertising look like in 2020, and what do we need to do now for this future?(Wharton’s Future of Advertising project, 13 November 2012)
  29. An olive branch to advertising (12 September 2012, on the ProjectVRM blog)

I expect, once the GDPR gets enforced, I can start writing about People + Publishing and even People + Advertising. (I have long histories in both publishing and advertising, by the way. So all of this is close to home.)

Meanwhile, you can get a jump on the GDPR by blocking third party cookies in your browsers, which will stop most of today’s tracking by adtech. Customer Commons explains how.

Let’s start with Facebook’s Surveillance Machine, by Zeynep Tufekci in last Monday’s New York Times. Among other things (all correct), Zeynep explains that “Facebook makes money, in other words, by profiling us and then selling our attention to advertisers, political actors and others. These are Facebook’s true customers, whom it works hard to please.”

Irony Alert: the same is true for the Times, along with every other publication that lives off adtech: tracking-based advertising. These pubs don’t just open the kimonos of their readers. They bring readers’ bare digital necks to vampires ravenous for the blood of personal data, all for the purpose of aiming “interest-based” advertising at those same readers, wherever those readers’ eyeballs may appear—or reappear in the case of “retargeted” advertising.

With no control by readers (beyond tracking protection which relatively few know how to use, and for which there is no one approach, standard, experience or audit trail), and no blood valving by the publishers who bare those readers’ necks, who knows what the hell actually happens to the data?

Answer: nobody knows, because the whole adtech “ecosystem” is a four-dimensional shell game with hundreds of players

or, in the case of “martech,” thousands:

For one among many views of what’s going on, here’s a compressed screen shot of what Privacy Badger showed going on in my browser behind Zeynep’s op-ed in the Times:

[Added later…] @ehsanakhgari tweets pointage to WhoTracksMe’s page on the NYTimes, which shows this:

And here’s more irony: a screen shot of the home page of RedMorph, another privacy protection extension:

That quote is from Free Tools to Keep Those Creepy Online Ads From Watching You, by Brian X. Chen and Natasha Singer, and published on 17 February 2016 in the Times.

The same irony applies to countless other correct and important reportage on the Facebook/Cambridge Analytica mess by other writers and pubs. Take, for example, Cambridge Analytica, Facebook, and the Revelations of Open Secrets, by Sue Halpern in yesterday’s New Yorker. Here’s what RedMorph shows going on behind that piece:

Note that I have the data leak toward Facebook.net blocked by default.

Here’s a view through RedMorph’s controller pop-down:

And here’s what happens when I turn off “Block Trackers and Content”:

By the way, I want to make clear that Zeynep, Brian, Natasha and Sue are all innocents here, thanks both to the “Chinese wall” between the editorial and publishing functions of the Times, and the simple fact that the route any ad takes between advertiser and reader through any number of adtech intermediaries is akin to a ball falling through a pinball machine. Refresh your page while reading any of those pieces and you’ll see a different set of ads, no doubt aimed by automata guessing that you, personally, should be “impressed” by those ads. (They’ll count as “impressions” whether you are or not.)

Now…

What will happen when the Times, the New Yorker and other pubs own up to the simple fact that they are just as guilty as Facebook of leaking data about their readers to other parties, for—in many if not most cases—God knows what purposes besides “interest-based” advertising? And what happens when the EU comes down on them too? It’s game-on after 25 May, when the EU can start fining violators of the General Data Protection Regulation (GDPR). Key fact: the GDPR protects the data blood of what they call “EU data subjects” wherever those subjects’ necks are exposed in borderless digital world.

To explain more about how this works, here is the (lightly edited) text of a tweet thread posted this morning by @JohnnyRyan of PageFair:

Facebook left its API wide open, and had no control over personal data once those data left Facebook.

But there is a wider story coming: (thread…)

Every single big website in the world is leaking data in a similar way, through “RTB bid requests” for online behavioural advertising #adtech.

Every time an ad loads on a website, the site sends the visitor’s IP address (indicating physical location), the URL they are looking at, and details about their device, to hundreds -often thousands- of companies. Here is a graphic that shows the process.

The website does this to let these companies “bid” to show their ad to this visitor. Here is a video of how the system works. In Europe this accounts for about a quarter of publishers’ gross revenue.

Once these personal data leave the publisher, via “bid request”, the publisher has no control over what happens next. I repeat that: personal data are routinely sent, every time a page loads, to hundreds/thousands of companies, with no control over what happens to them.

This means that every person, and what they look at online, is routinely profiled by companies that receive these data from the websites they visit. Where possible, these data and combined with offline data. These profiles are built up in “DMPs”.

Many of these DMPs (data management platforms) are owned by data brokers. (Side note: The FTC’s 2014 report on data brokers is shocking. See https://www.ftc.gov/reports/data-brokers-call-transparency-accountability-report-federal-trade-commission-may-2014. There is no functional difference between an #adtech DMP and Cambridge Analytica.

—Terrell McSweeny, Julie Brill and EDPS

None of this will be legal under the #GDPR. (See one reason why at https://t.co/HXOQ5gb4dL). Publishers and brands need to take care to stop using personal data in the RTB system. Data connections to sites (and apps) have to be carefully controlled by publishers.

So far, #adtech’s trade body has been content to cover over this wholesale personal data leakage with meaningless gestures that purport to address the #GDPR (see my note on @IABEurope current actions here: https://t.co/FDKBjVxqBs). It is time for a more practical position.

And advertisers, who pay for all of this, must start to demand that safe, non-personal data take over in online RTB targeting. RTB works without personal data. Brands need to demand this to protect themselves – and all Internet users too. @dwheld @stephan_lo @BobLiodice

Websites need to control
1. which data they release in to the RTB system
2. whether ads render directly in visitors’ browsers (where DSPs JavaScript can drop trackers)
3. what 3rd parties get to be on their page
@jason_kint @epc_angela @vincentpeyregne @earljwilkinson 11/12

Lets work together to fix this. 12/12

Those last three recommendations are all good, but they also assume that websites, advertisers and their third party agents are the ones with the power to do something. Not readers.

But there’s lots readers will be able to do. More about that shortly. Meanwhile, publishers can get right with readers by dropping #adtech and going back to publishing the kind of high-value brand advertising they’ve run since forever in the physical world.

That advertising, as Bob Hoffman (@adcontrarian) and Don Marti (@dmarti) have been making clear for years, is actually worth a helluva lot more than adtech, because it delivers clear creative and economic signals and comes with no cognitive overhead (for example, wondering where the hell an ad comes from and what it’s doing right now).

As I explain here, “Real advertising wants to be in a publication because it values the publication’s journalism and readership” while “adtech wants to push ads at readers anywhere it can find them.”

Doing real advertising is the easiest fix in the world, but so far it’s nearly unthinkable for a tech industry that has been defaulted for more than twenty years to an asymmetric power relationship between readers and publishers called client-server. I’ve been told that client-server was chosen as the name for this relationship because “slave-master” didn’t sound so good; but I think the best way to visualize it is calf-cow:

As I put it at that link (way back in 2012), Client-server, by design, subordinates visitors to websites. It does this by putting nearly all responsibility on the server side, so visitors are just users or consumers, rather than participants with equal power and shared responsibility in a truly two-way relationship between equals.

It doesn’t have to be that way. Beneath the Web, the Net’s TCP/IP protocol—the gravity that holds us all together in cyberspace—remains no less peer-to-peer and end-to-end than it was in the first place. Meaning there is nothing about the Net that prevents each of us from having plenty of power on our own.

On the Net, we don’t need to be slaves, cattle or throbbing veins. We can be fully human. In legal terms, we can operate as first parties rather than second ones. In other words, the sites of the world can click “agree” to our terms, rather than the other way around.

Customer Commons is working on exactly those terms. The first publication to agree to readers terms is Linux Journal, where I am now editor-in-chief. The first of those terms is #P2B1(beta), says “Just show me ads not based on tracking me,” and is hashtagged #NoStalking.

In Help Us Cure Online Publishing of Its Addiction to Personal Data, I explain how this models the way advertising ought to be done: by the grace of readers, with no spying.

Obeying readers’ terms also carries no risk of violating privacy laws, because every pub will have contracts with its readers to do the right thing. This is totally do-able. Read that last link to see how.

As I say there, we need help. Linux Journal still has a small staff, and Customer Commons (a California-based 501(c)(3) nonprofit) so far consists of five board members. What it aims to be is a worldwide organization of customers, as well as the place where terms we proffer can live, much as Creative Commons is where personal copyright licenses live. (Customer Commons is modeled on Creative Commons. Hats off to the Berkman Klein Center for helping bring both into the world.)

I’m also hoping other publishers, once they realize that they are no less a part of the surveillance economy than Facebook and Cambridge Analytica, will help out too.

[Later…] Not long after this post went up I talked about these topics on the Gillmor Gang. Here’s the video, plus related links.

I think the best push-back I got there came from Esteban Kolsky, (@ekolsky) who (as I recall anyway) saw less than full moral equivalence between what Facebook and Cambridge Analytica did to screw with democracy and what the New York Times and other ad-supported pubs do by baring the necks of their readers to dozens of data vampires.

He’s right that they’re not equivalent, any more than apples and oranges are equivalent. The sins are different; but they are still sins, just as apples and oranges are still both fruit. Exposing readers to data vampires is simply wrong on its face, and we need to fix it. That it’s normative in the extreme is no excuse. Nor is the fact that it makes money. There are morally uncompromised ways to make money with advertising, and those are still available.

Another push-back is the claim by many adtech third parties that the personal data blood they suck is anonymized. While that may be so, correlation is still possible. See Study: Your anonymous web browsing isn’t as anonymous as you think, by Barry Levine (@xBarryLevine) in Martech Today, which cites De-anonymizing Web Browsing Data with Social Networks, a study by Jessica Su (@jessicatsu), Ansh Shukla (@__anshukla__) and Sharad Goel (@5harad)
of Stanford and Arvind Narayanan (@random_walker) of Princeton.

(Note: Facebook and Google follow logged-in users by name. They also account for most of the adtech business.)

One commenter below noted that this blog as well carries six trackers (most of which I block).. Here is how those look on Ghostery:

So let’s fix this thing.

[Later still…] Lots of comments in Hacker News as well.

[Later again (8 April 2018)…] About the comments below (60+ so far): the version of commenting used by this blog doesn’t support threading. If it did, my responses to comments would appear below each one. Alas, some not only appear out of sequence, but others don’t appear at all. I don’t know why, but I’m trying to find out. Meanwhile, apologies.

Just before it started, the geology meeting at the Santa Barbara Central Library on Thursday looked like this from the front of the room (where I also tweeted the same pano):

Geologist Ed Keller

Our speakers were geology professor Ed Keller of UCSB and Engineering Geologist Larry Gurrola, who also works and studies with Ed. That’s Ed in the shot below.

As a geology freak, I know how easily terms like “debris flow,” “fanglomerate” and “alluvial fan” can clear a room. But this gig was SRO. That’s because around 3:15 in the morning of January 9th, debris flowed out of canyons and deposited fresh fanglomerate across the alluvial fan that comprises most of Montecito, destroying (by my count on the map below) 178 buildings, damaging more than twice that many, and killing 23 people. Two of those—a 2 year old girl and a 17 year old boy—are still interred in the fresh fanglomerate and sought by cadaver dogs.* The whole thing is beyond sad and awful.

The town was evacuated after the disaster so rescue and recovery work could proceed without interference, and infrastructure could be found and repaired: a job that required removing twenty thousand truckloads of mud and rocks. That work continues while evacuation orders are gradually lifted, allowing the town to repopulate itself to the very limited degree it can.

I talked today with a friend whose business is cleaning houses. Besides grieving the dead, some of whom were friends or customers, she reports that the cleaning work is some of the most difficult she has ever faced, even in homes that were spared the mud and rocks. Refrigerators and freezers, sitting closed and without electricity for weeks, reek of death and rot. Other customers won’t be back because their houses are gone.

Highway 101, one of just two freeways connecting Northern and Southern California, runs through town near the coast and more than two miles from the mountain front. Three debris flows converged on the highway and used it as a catch basin, filling its deep parts to the height of at least one bridge before spilling over its far side and continuing to the edge of the sea. It took two weeks of constant excavation and repair work before traffic could move again. Most exits remain closed. Coast Village Road, Montecito’s Main Street, is open for employees of stores there, but little is open for customers yet, since infrastructural graces such as water are not fully restored. (I saw the Honor Bar operating with its own water tank, and a water truck nearby.) Opening Upper Village will take longer. Some landmark institutions, such as San Ysidro Ranch and La Casa Santa Maria, will take years to restore. (From what I gather, San Ysidro Ranch, arguably the nicest hotel in the world, was nearly destroyed. Its website thank firefighters for salvation from the Thomas Fire. But nothing, I gather, could have save it from the huge debris flow wiped out nearly everything on the flanks of San Ysidro Creek. (All the top red dots along San Ysidro Creek in the map below mark lost buildings at the Ranch.)

Here is a map with final damage assessments. I’ve augmented it with labels for the canyons and creeks (with one exception: a parallel creek west of Toro Canyon Creek):

Click on the map for a closer view, or click here to view the original. On that one you can click on every dot and read details about it.

I should pause to note that Montecito is no ordinary town. Demographically, it’s Beverly Hills draped over a prettier landscape and attractive to people who would rather not live in Beverly Hills. (In fact the number of notable persons Wikipedia lists for Montecito outnumbers those it lists for Beverly Hills by a score of 77 to 71.) Culturally, it’s a village. Last Monday in The New Yorker, one of those notable villagers, T.Coraghessan Boyle, unpacked some other differences:

I moved here twenty-five years ago, attracted by the natural beauty and semirural ambience, the short walk to the beach and the Lower Village, and the enveloping views of the Santa Ynez Mountains, which rise abruptly from the coastal plain to hold the community in a stony embrace. We have no sidewalks here, if you except the business districts of the Upper and Lower Villages—if we want sidewalks, we can take the five-minute drive into Santa Barbara or, more ambitiously, fight traffic all the way down the coast to Los Angeles. But we don’t want sidewalks. We want nature, we want dirt, trees, flowers, the chaparral that did its best to green the slopes and declivities of the mountains until last month, when the biggest wildfire in California history reduced it all to ash.

Fire is a prerequisite for debris flows, our geologists explained. So is unusually heavy rain in a steep mountain watershed. There are five named canyons, each its own watershed, above Montecito, as we see on the map above. There are more to the east, above Summerland and Carpinteria, the next two towns down the coast. Those towns also took some damage, though less than Montecito.

Ed Keller put up this slide to explain conditions that trigger debris flows, and how they work:

Ed and Larry were emphatic about this: debris flows are not landslides, nor do many start that way (though one did in Rattlesnake Canyon 1100 years ago). They are also not mudslides, so we should stop calling them that. (Though we won’t.)

Debris flows require sloped soils left bare and hydrophobic—resistant to water—after a recent wildfire has burned off the chaparral that normally (as geologists say) “hairs over” the landscape. For a good look at what soil surfaces look like, and are likely to respond to rain, look at the smooth slopes on the uphill side of 101 east of La Conchita. Notice how the surface is not only a smooth brown or gray, but has a crust on it. In a way, the soil surface has turned to glass. That’s why water runs off of it so rapidly.

Wildfires are common, and chaparral is adapted to them, becoming fuel for the next fire as it regenerates and matures. But rainfalls as intense as this one are not common. In just five minutes alone, more than half an inch of rain fell in the steep and funnel-like watersheds above Montecito. This happens about once every few hundred years, or about as often as a tsunami.

It’s hard to generalize about the combination of factors required, but Ed has worked hard to do that, and this slide of his is one way of illustrating how debris flows happen eventually in places like Montecito and Santa Barbara:

From bottom to top, here’s what it says:

  1. Fires happen almost regularly, spreading most widely where chaparral has matured to become abundant fuel, as the firefighters like to call it.
  2. Flood events are more random, given the relative rarity of rain and even more rare rains of “biblical” volume. But they do happen.
  3. Stream beds in the floors of canyons accumulate rocks and boulders that roll down the gradually eroding slopes over time. The depth of these is expressed as basin instablity. Debris flows clear out the rocks and boulders when a big flood event comes right after a fire and basin becomes stable (relatively rock-free) again.
  4. The sediment yield in a flood (F) is maximum when a debris flow (DF) occurs.
  5. Debris flows tend to happen once every few hundred years. And you’re not going to get the big ones if you don’t have the canyon stream bed full of rocks and boulders.

About this set of debris flows in particular:

  1. Destruction down Oak Creek wasn’t as bad as on Montecito, San Ysidro, Buena Vista and Romero Creeks because the canyon feeding it is smaller.
  2. When debris flows hit an obstruction, such as a bridge, they seek out a new bed to flow on. This is one of the actions that creates an alluvial fan. From the map it appears something like that happened—
    1. Where the flow widened when it hit Olive Mill Road, fanning east of Olive Mill to destroy all three blocks between Olive Mill and Santa Elena Lane before taking the Olive Mill bridge across 101 and down to the Biltmore while also helping other flows fill 101 as well. (See Mac’s comment below, and his link to a top map.)
    2. In the area between Buena Vista Creek and its East Fork, which come off different watersheds
    3. Where a debris flow forked south of Mountain Drive after destroying San Ysidro Ranch, continuing down both Randall and El Bosque Roads.

For those who caught (or are about to catch) Ellen’s Facetime with Oprah visiting neighbors, that happened among the red dots at the bottom end of the upper destruction area along San Ysidro Creek, just south of East Valley Road. Oprah’s own place is in the green area beside it on the left, looking a bit like Versailles. (Credit where due, though: Oprah’s was a good and compassionate report.)

Big question: did these debris flows clear out the canyon floors? We (meaning our geologists, sedimentologists, hydrologists and other specialists) won’t know until they trek back into the canyons to see how it all looks. Meanwhile, we do have clues. For example, here are after-and-before photos of Montecito, shot from space. And here is my close-up of the latter, shot one day after the event, when everything was still bare streambeds in the mountains and fresh muck in town:

See the white lines fanning back into the mountains through the canyons (Cold Spring, San Ysidro, Romero, Toro) above Montecito? Ed explained that these appear to be the washed out beds of creeks feeding into those canyons. Here is his slide showing Cold Spring Creek before and after the event:

Looking back at Ed’s basin threshold graphic above, one might say that there isn’t much sediment left for stream beds to yield, and that those in the floors of the canyons have returned to stability, meaning there’s little debris left to flow.

But that photo was of just one spot. There are many miles of creek beds to examine back in those canyons.

Still, one might hope that Montecito has now had its required 200-year event, and a couple more centuries will pass before we have another one.

Ed and Larry caution against such conclusions, emphasizing that most of Montecito’s and Santa Barbara’s inhabited parts gain their existence, beauty or both by grace of debris flows. If your property features boulders, Ed said, a debris flow put them there, and did that not long ago in geologic time.

For an example of boulders as landscape features, here are some we quarried out of our yard more than a decade ago, when we were building a house dug into a hillside:

This is deep in the heart of Santa Barbara.

The matrix mud we now call soil here is likely a mix of Juncal and Cozy Dell shale, Ed explained. Both are poorly lithified silt and erode easily. The boulders are a mix of Matilija and Coldwater sandstone, which comprise the hardest and most vertical parts of the Santa Ynez mountains. The two are so similar that only a trained eye can tell them apart.

All four of those geological formations were established long after dinosaurs vanished. All also accumulated originally as sediments, mostly on ocean floors, probably not far from the equator.

To illustrate one chapter in the story of how those rocks and sediments got here, UCSB has a terrific animation of how the transverse (east-west) Santa Ynez Mountains came to be where they are. Here are three frames in that movie:

What it shows is how, when the Pacific Plate was grinding its way northwest about eighteen million years ago, a hunk of that plate about a hundred miles long and the shape of a bread loaf broke off. At the top end was the future Malibu hills and at the bottom end was the future Point Conception, then situated south of what’s now Tijuana. The future Santa Barbara was west of the future Newport Beach. Then, when the Malibu end of this loaf got jammed at the future Los Angeles, the bottom end of the loaf swept out, clockwise and intact. At the start it was pointing at 5 o’clock and at the end (which isn’t), it pointed at 9:00. This was, and remains, a sideshow off the main event: the continuing crash of the Pacific Plate and the North American one.

Here is an image that helps, from that same link:

Find more geology, with lots of links, in Making sense of what happened to Montecito. I put that post up on the 15th and have been updating it since then. It’s the most popular post in the history of this blog, which I started in 2007. There are also 58 comments, so far.

I’ll be adding more to this post after I visit as much as I can of Montecito (exclusion zones permitting). Meanwhile, I hope this proves useful. Again, corrections and improvements are invited.

30 January

6 April, 2020
*I was told later, by a rescue worker who was on the case, that it was possible that both victims’ bodies had washed all the way to the ocean, and thus will never be found.

In this Edhat story, Ed Keller visits a recently found prior debris flow. An excerpt:

The mud and boulders from a prehistoric debris flow, the second-to-last major flow in Montecito, have been discovered by a UCSB geologist at the Bonnymede condominiums and Hammond’s Meadow, just east of the Coral Casino.

The flow may have occurred between 1,000 and 2,000 years ago, said Ed Keller, a professor of earth science at the university. He’s calling it the “penultimate event.” It came down a channel of Montecito Creek and was likely larger on that creek than during the disaster of Jan. 9, 2018, Keller said. Of 23 people who perished on Jan. 9, 17 died along Montecito Creek.

The long interval between the two events means that the probability of another catastrophic debris flow occurring in Montecito in the next 1,000 years is very low, Keller said.

“It’s reassuring,” he said, “They’re still pretty rare events, if you consider you need a wildfire first and then an intense rainfall. But smaller debris flows could occur, and you could still get a big flash flood. If people are given a warning to evacuate, they should heed it.”

This post continues the inquiry I started with Making sense of what happened to Montecito. That post got a record number of reads for this blog, and 57 comments as well.

I expect to learn more at the community meeting this evening with UCSB geologist Ed Keller in the Faulkner Room in the main library in Santa Barbara. Here’s the Library schedule. Note that the meeting will be streamed live on Facebook.

Meanwhile, to help us focus on the geology questions, here is the final post-mudslide damage inspection map of Montecito:

I left out Carpinteria, because of the four structures flagged there, three were blue (affected) and one was yellow (minor), and none were orange (major) or red (destroyed). I’m also guessing they were damaged by flooding rather than debris flow. I also want to make the map as legible as possible, so we can focus on where the debris flows happened, and how we might understand the community’s prospects for the future.

So here are my questions, all subject to revision and correction.

  1. How much of the damage was due to debris flow alone, and how much to other factors (e.g. rain-caused flooding, broken water pipes)?
  2. Was concentration of rain the main reason why we saw flows in the canyons above Montecito, but not (or less so) elsewhere?
  3. Where exactly did the debris flow from? And has the area been surveyed well enough to predict what future debris flows might happen if we get big rains this winter and ones to follow?
  4. Do we need bigger catch basins for debris, like they have at the base of the San Gabriels, above Los Angeles’ basin?
  5. How do the slopes above Montecito and Santa Barbara differ from other places (e.g. the San Gabriels) where debris flows (and rock falls) are far more common?
  6. What geology-advised changes in our infrastructure (especially water and gas) might we make, based on what we’ve learned so far?
  7. What might we expect (that most of us don’t now) in the form of other catastrophes that show up in the geologic record? For example, earthquakes and tsunamis. See here: “This earthquake was associated with by far the largest seismic sea wave ever reported for one originating in California. Descriptive accounts indicate that it may have reached elevations of 15 feet at Gaviota, 30 to 35 feet at Santa Barbara, and 15 feet or more in Ventura. It may have even shown visible effects in the San Francisco harbor.” There is also this, which links to questions about the former report. (Still, there have been a number of catastrophic earthquakes on or affecting the South Coast, and it has been 93 years since the 1925 quake — and the whole Pacific Coast is subject to tsunamis. Here are some photos of the quake.)

Note that I don’t want to ask Ed to play a finger-pointing role here. Laying blame isn’t his job, unless he’s blaming nature for being itself.

Additional reading:

  • Dan McCaslin: Rattlesnake Canyon Fine Now for Day Hiking (Noozhawk) Pull-quote: “Santa Barbara geologist Ed Keller has said that all of Santa Barbara is built on debris flows piled up during the past 60,000 years. Around 1100 A.D., a truly massive debris flow slammed through Rattlesnake Canyon into Mission Canyon, leaving large boulders as far down as the intersection of Alamar Avenue and State Street (go check). There were Chumash villages in the area, and they may have been completely wiped out then. While some saddened Montecitans claim that sudden flash floods and debris flows should have been forecast more accurately, this seems impossible.”
  • Those deadly mudslides you’ve read about? Expect worse in the future. (Wall Street Journal) Pull-quote: “Montecito is particularly at risk as the hill slopes above town are oversteepened by faulting and rapid uplift, and much of the town is built on deposits laid down by previous floods. Some debris basins were in place, but they were quickly overtopped by the hundreds of thousands of cubic yards of water and sediment. While high post-fire runoff and erosion rates could be expected, it was not possible to accurately predict the exact location and extreme magnitude of this particular storm and resulting debris flows.”
  • Evacuation Areas Map.
  • Thomas Fire: Forty Days of Devastation (LA Times) Includes what happened to Montecito. Excellent step-by-step 3D animation.

« Older entries