Privacy and Anonymity


It has been an interesting summer on the privacy front. Following the Spring revelations at Harvard about email searches, we have watched Edward Snowden subject the intelligence agencies of the U.S. to a version of the classic Chinese water torture (except he has replaced the drops of water with bowling balls) by releasing information about all of the information being gathered by them. I’ve been a bit distressed by how little discussion there has been about all of this in public, although an interesting alliance of the far left and the far right in the House of Representatives (the self-proclaimed “Wing Nuts”) seems to be paying some attention.

There are also a host of interesting questions that aren’t being addressed, but which the different sides seem to assume have already been answered (often in different ways). One of these questions is whether gathering data is a privacy issue, or if the issue only arises if and when the data is accessed. Those defending the gathering of all of the data seem to think that it is access that needs to be monitored and watched, telling us that we shouldn’t be worried because while they have all that data, actual access to the data is far more controlled. Those who are worried about the gathering seem to believe that the act of gathering the data is the problem, often pointing out that once the data is collected, someone will do something with it. Another question has to do with whether or not privacy is violated when data is viewed algorithmically, rather than when a human being looks at it. Again, those defending the various data gathering programs seem to hold that computers looking at the data has no privacy implications, while those objecting to the programs think that even algorithms can violate privacy.

I think these are both interesting questions, and I’m not sure I know the right answer to either of them. I have been able to construct some cases that make me lean one way, while others make me lean the other.

Another issue I don’t see being raised has to do with the difference between privacy and anonymity, and how the two relate. In fact, what I see in a lot of the technical discussions around the questions of data aggregation, is an implicit equation of privacy and anonymity. This is an equivalence that I think does both sides a disservice, but especially the side wanting to argue for privacy.

Anonymity, roughly put, is the inability to identify the actor of a particular action or the individual with whom some set of properties is associated. The inability to identify may be because you can’t see the individual (as is done for symphony auditions, where the players are placed behind a screen, a practice that has increased the number of female members of major orchestras), or because there is no identifier associated with some document, or when a database has been scrubbed so that only some data is associated with each record (although this can be more difficult than most think).

Privacy is more difficult to characterize (take my course in the fall if you want lots more discussion of that), but is more involved in not knowing something about someone. My medical records are private not because you don’t know who I am, but because you don’t have access (or have the good taste not to access) those facts about me. What happens in Vegas stays in Vegas not because everyone there is anonymous (that would make hotel registration interesting), but because those who are there don’t tell.

I often think that voting is the best example that can illustrate this distinction. You don’t want voting to be anonymous; it is a good thing to need to identify yourself at the polls and make sure that you are on the voter lists (how you do this, and how much trouble it should be, is a very different issue). But voting is a very private thing; you want to make sure that the vote I cast is private both to protect me from any blowback (I grew up blue in a very red state) but also to protect the integrity of the voting process itself (as long as voting is private, it is hard for someone trying to buy votes to determine if the money spent led to the right result in any individual case).

One problem with this slushy notion of how to define privacy is that it is hard to build a technology that will insure it if you don’t know what it is. So a lot of work in the technology space that appears to preserve privacy actually centers around preserving anonymity. Tor is one of my favorite examples; it is often seen as privacy preserving, but in fact is designed to insure anonymity.

The argument over the collection of meta-data rather than data is all about this distinction. If (and it is a big if) the metadata on phone calls and internet communications only reveals the identity of those communicating, it violates the anonymity of those who are communicating. The analogy here is if you follow someone and note all of the people the person followed talks to, without actually hearing what the conversations are about. Such a thing would be creepy, but it isn’t clear (especially if you are following the person in public areas) that it violates anyone’s privacy.

Confusing privacy and anonymity also allows those who may be violating privacy to point out that insuring anonymity helps bad people to cover their bad actions (the standard “terrorists and child pornographers” argument, which reduces to some variation of “if we insure anonymity, we help the terrorists and child pornographers”). No one wants to enable the bad actors to act in those ways, so it appears that we have to give something up (although, if you really believe in privacy as a right, perhaps you are willing to give some of this up– just as free speech has to include those who say things that you really don’t like).

I’d really like to see some deeper thinking here, although I expect that it won’t happen, at least in public. These are important issues, and they should be thought about calmly and not in the heat of some shocking revelation (like the current discussion) or in reaction to some horrific event (like the 9/11 terrorist attacks, that gave us the current legal frameworks). One of the problems with privacy law in the U.S. is that it tends to be reactive rather than contemplative.

Maybe we can do better at Harvard. I hope so.


Immoral, Illegal, or Creepy…
The four best books for software engineers

1 Comment »

  1. Veli Bilgilendirme

    November 18, 2013 @ 2:14 am


    In my opinion, when a person accepts to provide his or her personal information in order to receive a service or to be a member in somewhere, there exists the possibility of the data to be used somewhere. The entity receiving personal data has the responsibility to explain to the data owner that to what degree the data might be used or to get permission from the data owner to use the personal data or to provide some assurance that the data will not be used for any beneficiary purposes.

Leave a Comment