Algorithmic Bias: Where is the Problem?


What is algorithmic bias? That is, how can we actually define it in a meaningful, constructive way that can help us to ultimately create a more equitable society to live in? To begin thinking about this question more deeply, we must consider a few different ideas of algorithmic bias, some clarifications, and then what benchmark we are comparing algorithms to.

When I first read the following ProPublica article about predictive policing (found here) over a year ago, I was caught off guard. I was convinced that there was some sort of problem, but had to work through what exactly the problem was and what that meant with regards to algorithms and the sort of responsibility for software engineers who come up with these algorithms. However, after reading some clarifications of the ProPublica article and some statistical studies showing that the data itself was biased (based on the data that ProPublica published -> they responsibly published all of the data that they used). Now, it’s also important here to define what I mean by bias. In this colloquial sense, I simply mean that the data shows a disparate impact against a group of people based on ethically non-essential characteristics, like race. I also believe that this is a common use of the term when speaking about bias within this context.

Following the ProPublica article, a common reaction is to be up-in-arms against the dangers of such a technology as predictive policing -> will this increase the disparity? Keep it the same such that we can’t improve it? While these fears are justified and legitimate fears to have, it is important to first acknowledge that there is a real problem that this article unearths, but then, not to jump to a conclusion about what is to blame, namely the algorithms. We should not draw conclusions about what to blame simply because of a lack of understanding or a lack of information. It is a major danger and error to jump to a conclusion based off of a lack of information, namely to blame algorithms for all bias simply because we don’t understand what the algorithm is doing. In this case, it turns out that the algorithm itself was okay, but the data was skewed because of bias in the world that already exists. The important point here is that the algorithm itself was constructed in such a manner that it did its job exactly, with no “bias” or mistakes. It just so happened that the data that the algorithm used to make predictions, in this case about which areas were more likely to have crime, was skewed based on an inherent disparity that exists in the world.

If we are comparing algorithms to a benchmark of perfection, then they will fall short. By nature of uncertainty, there will always be false positives and false negatives, although this can be limited by a very good algorithm. However, there are always false positives and false negatives when humans make important decisions too. As an example, consider the study that examined the increase in harsher rulings following the loss of a home football team…). So what sort of benchmark should we compare to? If an algorithm consistently performs better and more equitably than a human, then should we use that algorithm? It seems that the rational answer should be a clear yes, but when we really think of putting life or death decisions into the hands of a machine, many would likely say that we should not. Then, we should consider what the real difference between the two cases are and why one might be not be comfortable with choosing the algorithm. Is it a lack of transparency? (i.e. algorithmic transparency and education about the algorithm might help). Or is it a lack of an intangible humanity? Moving forward, it will be exceedingly important for us as a society to think about the different ways we can define algorithmic bias in a constructive way and consider which sort of situations the use of algorithms might be okay and why.

Universal Identity: Estonia e-residency


This week we discussed a case-study of a country that is actually implementing universal identification systems for citizens through the use of technology. In the first case with Estonia, the government has already, successfully by many measures, implemented an identity system wherein all citizens are given a uniquely identifying public/private key pair, generated by the government, so that citizens are able to fully identify themselves online. This opens up new opportunities for all citizens to have an official identity, and use this ensured identity to vote online, complete tax returns online, obtain and fulfill prescriptions online,  set up businesses, sign contracts, etc.…). There are clearly many benefits with a system like this, and many loosely similar systems exist at less successful and smaller scales in other countries, like social security numbers in the US, etc., but the difference in the scale and success of the Estonian operation with regards to the percent of their citizens who enroll makes the Estonian system fundamentally different than any other. Many interesting questions arise with regards to the Estonian system upon further investigation, and for the rest of this blog I’ll focus on questions of the effectiveness of certain abilities that are created by the Estonian system, namely the ability to vote online.


One article regarding the effectiveness of Estonia’s digital government (here), suggests that after the system was implemented to allow e-voting to occur, e-voting actually became less popular, stating that “electronic voting is less popular because Estonians value their new found freedom to choose and many dress up in order to go to their polling station.” This is very interesting because I wonder whether voter turnout as a whole increased because of the e-voting initiative, even though less people actually decide to vote online. That is, even though e-voting is less popular, more people were compelled to go out and vote after e-voting was pushed. Given the potential of this technology and social phenomena that is created through the Estonian e-government system, I am hopeful that there is a way to really increase voter turnout and other functions such as census participation. This would be an interesting social phenomena or experiment to look into. Given the paradigm shift that the Estonian government has brought into being, I feel that there is potential for many of the fundamental issues in the citizenship of a nation to be more effectively addressed through this new system.

Watching the Science Center: Video Surveillance in Public Spaces


This past week, we touched on topics of privacy with regards to the widespread . The motivation for this post comes from a recent realization that there is a video camera overlooking the Harvard Science Center Plaza, one of the main student spaces between the dorms and classes, that is active 24/7 and is available for anyone to use. You can look at the live feed now. This fact troubled me a bit given that most students aren’t actively aware that they are being monitored. This raised many different questions for myself, including the following: Is the footage saved for future use? Why do they allow anyone access to this footage? More importantly, what is the purpose of having the camera here in the first place? For this post I’ll be focusing on the final question, namely, what are the different purposes for having surveillance cameras and video cameras in public spaces, and what further questions does this raise.

Although there isn’t an explicit reason for having this camera outlined on the website, we can safely assume it is a mix of the following common reasons: (1) To monitor events in a large public space for the safety of the community (similar to the way the Boston Marathon bomber was caught retroactively using security footage), or (2) As a fun way to allow people to see what is going on at the heart of Harvard’s campus.

The first reason seems plausible given that this sort of surveillance of public areas is an increasing trend across many institutions in the United States. However, it is difficult to know for sure for two reasons: (1) There is no information about whether or not the footage is saved for later use; my hunch is it probably is and (2) It seems strange that the University would publish this footage publicly if this was the primary purpose. This leads me to the second reason, which I think is the most likely reason. Given that the footage is published on the commonspaces website, promoting the public spaces at Harvard, it seems likely that the primary purpose of the footage is to give people a chance to see the plaza. This seems plausible; some people may just be interested in seeing what a common space actually looks like, and what better way than watching a live streamed video of it?

Would the general Harvard community act differently if they knew that they were constantly being watched while in the plaza? Does the capturing of every moment in these public spaces take away from the ability to fully relax without fear that a video may be taken out of context and used against a person? Will the constant surveillance in public spaces like the Science Center plaza lead to a notion of social cooling, or modified social behavior as is implied in Foucault’s notion of the Panopticon? Foucault’s idea was that in a circular prison where the prisoners cannot see the guard, but are always aware that the guard has the ability to constantly survey the prison, the prisoner’s will behave in a different manner as a result of their belief that they are always under surveillance. This idea can be pushed further in the notion of social cooling, where members of society will be less likely to take risks and be themselves, but rather conform to the norm because they are being watched. Will an increased awareness that we are being watched change our behavior and ultimately change or take away from the Harvard experience, or is it just harmless and a bit creepy?

Harvard Data Privacy Policy — Too Much or Too little?


Over the last few days, I’ve had the opportunity to read through Harvard’s “Policy on Access to Electronic Information” for the first time as an undergraduate at the College. To be frank, this is one of the few privacy policies I’ve actually read through entirely, despite accepting hundreds of privacy policies all the time (e.g. Google search, Google mail, apple products, etc). The policy itself is incredibly short and readable, unlike most privacy policies that we are often presented with when using different products — think about the long privacy agreements that are required prior to the use of just about any software product. The readability of this document means that I was able to fully understand the policy in a short amount of time (An actual copy of the information can be found here: policy_on_access_to_electronic_information.).

The privacy policy itself is entirely grounded on six important principles, which are the following.

  1. Access should occur only for a legitimate and important University purpose.
  2. Access should be authorized by an appropriate and accountable person.
  3. In general, notice should be given when user electronic information will be or has been accessed.
  4. Access should be limited to the user electronic information needed to accomplish the purpose.
  5. Sufficient records should be kept to enable appropriate review of compliance with this policy.
  6. Access should be subject to ongoing, independent oversight by a committee that includes faculty representation.

Initially, I was a bit aghast at the ambiguity of words that seemingly allows the University to access information with broad power, when they see fit. For example, the third principle states that “notice should be given when user electronic information will be or has been accessed.” Notifying a user about investigating and accessing his or her data should be a key principle of the privacy agreement. However, the ambiguity of this statement already allows the University to wait for an undisclosed amount of time before they have to give notice to the user. There is no specific amount of time after the electronic information is accessed by which the university has to let the user know that the information was accessed, which is troubling, as they could technically get away without ever letting a user know on the grounds that they were planning to in the future.

Moreover, in section III of the contents under the “Notice” section, the University obfuscates the principle further by saying that “notice ordinarily should be given to the user. All reasonable efforts should be made to give notice at the time of access or as soon thereafter as reasonably possible.” What exactly is considered a “reasonable effort” and what is considered “soon thereafter?” Moreover, there is a further issue in the ambiguity of the word “ordinary.” What situation would be considered not ordinary such that no notice has to be given to the user?

These sorts of questions were some of the first that came to my mind. However, I soon realized I am judging Harvard’s privacy policy in contrast to the basis of having complete and full ownership over all of my data. In reality, this isn’t a fair comparison. The reality of the situation is a bit different for two reasons: (1) there are many tradeoffs and drawbacks that come with having full ownership over data such that I am willing to give over my data for certain benefits (such as automatic backing up of data), and (2) compared to most other corporations in the United States, Harvard provides much more protection over the privacy of student and faculty data. Compared to other universities, I don’t know where Harvard stacks up, but, based on a few quick searches, (Boston University Electronic Info Policy here), it seems like Harvard has more explicit guidelines on who can and cannot access data. As a side note, this policy itself is relatively recent; the policy was signed and put into effect on March 31, 2014. Moreover, in reality Harvard has, as a response to this policy, been able to provide a greater sense of security in that petitions to search electronic data across the University must follow through strict formal procedures which determine whether or not the search is permissible.

While the ambiguity of some of the wording in the privacy policy may be cause for concern, the current reality of the situation created by the policy is more reassuring than not and seems to be a step forward in the University’s efforts to protect its students and faculty.

Intro Blog Post


Hi all!

This is my first, official blog post. The purpose of this first post is just to introduce the blog and set the stage for future posts. In this blog, I will be posting on a weekly basis on whatever philosophical and legal current issues and topics that I find interesting, either based on the discussions in class of from the assigned readings for IGA 538. The goal of this blog is continue and push forward the discussion on these incredibly relevant and important issues. I hope to achieve this by offering my own opinions on these issues on a public forum. You also have the ability to comment on posts as well which can help further the dialogue.

Thanks for reading the blog post! Enjoy.


Log in