This post is part of Bill of Health‘s symposium on the Law, Ethics, and Science of Re-Identification Demonstrations. Background on the symposium is here. You can call up all of the symposium contributions by clicking here. —MM
In Part 1, and Part 2 of this symposium contribution I wrote about a number of re-identification demonstrations and their reporting, both by the popular press and in scientific communications. However, even beyond the ethical considerations that I’ve raised about the accuracy of some of these communications, there are additional ethical, “scientific ethos”, and pragmatic public policy considerations involved in the conduct of re-identification research and de-identification practice that warrant some more thorough discussion and debate.
First Do No Harm
Unless we believe that the ends always justify the means, even obtaining useful results for guiding public policy (as was the case with the PGP demonstration attack’s validation of “perfect population register” issues) doesn’t necessarily mean that the conduct of re-identification research is on solid ethical footing. Yaniv Erlich’s admonition in his “A Short Ethical Manifesto for the Privacy Researcher“ blog post contributed as part of this symposium provides this wise advice: “Do no harm to the individuals in your study. If you can prove your point by a simulation on artificial data – do it.” This is very sound ethical advice in my opinion. I would argue that the re-identification risks for those individuals in the PGP study who had supplied 5-digit Zip Code and full date of birth were already understood to be unacceptably high (if these persons were concerned about being identified) and that no additional research whatsoever was needed to demonstrate this point. However, if additional arguments needed to be made about the precise levels of the risks, this could have been adequately addressed through the use of probability models. I’d also argue that “data intrusion scenario” uncertainty analyses which I discussed in Part 1 of this symposium contribution already accurately predicted the very small re-identification risks found for the sort of journalist and “nosy neighbor” attacks directed at the Washington hospital data. When strong probabilistic arguments can be made regarding potential re-identification risks, there is little possible purpose for undertaking actual re-identifications that can impact specific persons.
Looking more broadly, it seems more reasonably debatable whether the earlier January re-identification attacks by the Erlich lab on the CEPH – Utah Residents with Northern and Western European Ancestry (CEU) participants could have been warranted by virtue of the attack having exposed a previously underappreciated risk. However, I think an argument could likely be made that, given the prior work by Gitschier which had already revealed the re-identification vulnerabilities of CEU participants, the CEU portion of the Science paper also might not have served any additional purpose in directly advancing the science needed for development of good public policy. Without the CEU re-identifications though, it is unclear whether the surname inference paper would have been published (at least by a prominent journal like Science) and it also seems quite unlikely that it would have sustained nearly the level of media attention.