(or: Inadvertently Illegal Programming, A Primer)
Earlier this month, Google’s official engineering blog confessed that the company’s Street View cars and bikes have “inadvertently” gathered personal data in transit on unencrypted Wi-Fi networks for the past three years (see the post: Wi-Fi Data Collection). As chronicled in major news stories in the past three weeks, Google’s actions are under scrutiny by government regulators everywhere (see links to news stories at the end of this post).
[One of Google’s Ominous-Looking Spy Cars
photo by byrion — click to enlarge]
This is a topic close to my heart because my research group has been conducting similar surveys of wireless signals for the past five years as part of a project funded by the US National Science Foundation. Here’s a picture of our own slightly less obtrusive Wi-Fi sampling car in South Central Los Angeles in 2005. (On second thought, we shouldn’t have chosen a black SUV. Too scary.)
[At least our antenna isn’t as scary as Google’s.
Click to enlarge — photo Credit: Hope Hall]
Our project was research and not commerce, so thanks to something called the National Research Act of 1974, in order to begin our research project we needed ethics approval from a university panel of researchers and civilians. We had to investigate, explain and justify the privacy implications of our study to a group called the university’s “institutional review board” before we started doing anything.
My argument to the board went like this: We want to count the presence and absence of Wi-Fi networks, and we want to uniquely identify them so that we can tell where Wi-Fi exists and where it doesn’t. (This is the same thing Google wants to do. Other companies do it too, like Microsoft and Skyhook Wireless.) A main commercial motivation for this kind of project is to improve GPS accuracy (try it: http://loki.com/). Our research motivation was to understand the evolution and diffusion of computer networks.
This is akin to doing a survey of telephone adoption by counting telephone poles. We can do this research from public streets and sidewalks. We are looking at unencrypted information that is broadcast in the clear to everyone anyway (called the “management frame” — this information is what creates the list of available Wi-Fi access points that is on the upper right (Mac) or lower right (PC) of your laptop). We don’t look at the content of the transmissions.
Although our equipment looks different from your laptop (and works faster and on more channels), our code does essentially the same thing that your laptop does when you open it in a new place. It listens to see if there is any Wi-Fi around. That’s it. To me, it didn’t seem like a difficult ethical case to make. Indeed we easily passed ethics review and our research was declared exempt from further review.
To give you an example of what we see, here is a screenshot from a popular open source wireless sniffer, kismet. (We use a slightly modified version.)
[Kismet screenshot — click to enlarge.]
Google was trying to do the same thing that my wireless research group was doing — again, no ethical problems there. However, they claim to have “inadvertently” also listened to the content of communications. (This is called “payload” data.) Here’s the problem with the story we’re getting from Google: the word “inadvertently.”
I see no way that this could be inadvertent. Continuing my earlier metaphor: If your plan is to count telephone poles how would you “inadvertently” tap telephone lines and transcribe everything that you hear? The two actions are quite different. Of course I don’t know how Google wrote its software for these capture platforms. With our team we use slightly modified versions of open source wireless tools. It is possible to use tools like these to save the “payload” data from wireless systems. There are legitimate engineering reasons for doing so if you are trying to improve the performance of your network. That’s why these tools exist.
However, I don’t understand how we could ever have “inadvertently” done that. It isn’t like stumbling over a banana peel or forgetting to leave a light switch (or variable) turned on. Even sampling some of the payload data would start producing about 10x as much data as listening only to the management frame. Do you ever go to the store for a can of soda and “inadvertently” fill your cart with ten cans? I didn’t think so.
If you inadvertently started buying 10x as many groceries as you wanted, I bet you’d notice. I bet it would take you less than three years to notice, too.
The only interpretation I can think of is that the word “inadvertently” is being applied by the legal department. The real chain of events is probably that the coders at Google intentionally designed these systems to act in this (illegal) way, but they didn’t understand the legal and PR implications. Programmers may have set it up on their own initiative and not briefed anyone else who could have seen this disaster looming, but that isn’t “inadvertent.” And it isn’t a “programming error” — another phrase that is being used in the press.
From here Google looks pretty guilty. Now in the most recent news reports it looks like they are trying to destroy the data as quickly as possible as a way out of the scandal. But looking at the data would make it even clearer that its collection wasn’t accidental. So in this case destroying the private data may not be a way to protect the privacy of those they snooped on — instead it seems like a way to protect Google’s nontraditional use of this word: “inadvertently.”
See visualizations from our (legal and ethical) Wi-Fi research: The RED Project. From: Sandvig, C. (2007). The RED Project: Rendering Electromagnetic Distributions. Vectors: Journal of Culture and Technology in a Dynamic Vernacular. 3(1).
Thanks to Rajiv Shah for suggesting this post.
Related news coverage:
“Google Says it Collected Private Data by Mistake” (NYT), “Google Balks at Turning Over Private Internet Data to Regulators” (NYT), “FTC Asks Google to retain WiFi data” (Washington Post).
UPDATED on 5/29: Added kismet screenshot, fixed typo.