A novel way of defending against mass uses of our data

Permalink to this post

AI is getting better at performing mass categorization of photos and text. A developer can scrape a bunch of photos from, say, Facebook — either directly, likely violating the terms of service, or through offering an app by which people consent to the access — and then use a well-trained categorizer to automatically discern ethnicity, gender, or even identity.

Some defenses can be built in against abuse, starting with a technical parlor trick and ending with support from the law. There’s been promising research on “image perturbation” that adjusts a photo in a way that is unnoticeable to a human, but that completely confuses standard image recognition tools that might otherwise make it easy to categorize a photo. (There’s a helpful video summary of some of the research by NguyenYosinski, and Clune available here.)

For example, this intrepid group of MIT students can make Google’s otherwise-reliable image recognition algorithm mistake a turtle for a rifle, or a cat for … guacamole.

I’m part of a team at MIT and Harvard within the Assembly program — Thom Miano, Dhaval Adjodah, Francisco, Daniel Pedraza, Gretchen Greene, and Josh Joseph — that’s working on tools so that users can upload invisibly-modified photos of themselves to social media without making them so readily identifiable.

Those modifications won’t thwart AI tools forever — but they’ll represent an unmistakable indication about user preference, and the law can then demand that those preferences be respected.

There is already a model for this: photos taken with a smartphone are invisibly labeled with time, date, and location. Facebook and Twitter for years have automatically stripped this information out before they show those photos on their services, avoiding a privacy nightmare by which a single photo could instantly locate someone. (There would no doubt have been Congressional hearings had they failed to do this.)

They can and should similarly undertake, on behalf of their users, to perturb images with the latest technology to prevent widescale AI-assisted identification by others, and to provide an anchor similar to “do not track” to make user preferences about bulk downstream use abundantly clear.

And these defense need not only apply to photos. An insurance company had an opt-in plan for Facebook users to have the nature of their posts influence their car insurance rates. As the Guardian described it:

Facebook users who write in short, concise sentences, use lists, and arrange to meet friends at a set time and place, rather than just “tonight”, would be identified as conscientious. In contrast, those who frequently use exclamation marks and phrases such as “always” or “never” rather than “maybe” could be overconfident.

There are also techniques that have moved from academia to industry like “differential privacy” and its precursors, where decoy data — a few new random stray likes in a profile — can be introduced to allow for helpful generalizations from bulk data across lots of people while protecting individual privacy by preventing easy generalizations about a single person.