Determining the trustworthiness of what we read online is important.


Yesterday I was informed that Senator Tom Coburn published a report entitled “Wastebook, A Guide to Some of the Most Wasteful and Low Priority Government Spending of 2011”, which included my NSF grant “Trails of Trustworthiness: Understanding and Supporting Quality of Information in Real-Time Streams” as example number 34.

I was not familiar with Senator Coburn’s publication and was surprised and curious to see what would characterize my project as “wasteful”. My colleagues and I have been working on the problem of information reliability over the last 5 years and we have published more than a dozen papers in refereed conferences and journals, three of which have received the “Best Paper” distinction. What was it that Senator Coburn found so unacceptable that reviewers and scientific audiences overlooked? After reading the relevant sections of his publication, I was even more confused as to what the Senator deemed objectionable. Everything he mentions regarding my project seems positive:

Do you trust your twitter feed? The National Science Foundation is providing $492,055 in taxpayer dollars to researchers at Wellesley College to answer that question.    Researchers cite “the tremendous growth of the so-called Social Web” as a factor that will “put new stress to human abilities to act under time pressure in making decisions and determine the quality of information received.” Their work will analyze the “trails of trustworthiness” that people leave on information channels like Twitter. They will study how users mark certain messages as trustworthy and how they interact with others whose “trust values” are known.    The NSF grant also includes funding for an online course to study “what critical thinking means in our highly interconnected world,” in which we might be “interacting regularly with people we may never meet”.

However, the proposal is condescendingly titled,  “To Trust or Not to Trust Tweets, That is the Question.” This suggests that the author of the report may think that trust in online communication is not worth studying, or that Twitter is unworthy to be mentioned in a scientific proposal. But to those who have actually read the details of the proposal, this is a superficial criticism. What we are proposing to do is to create semi-automatic methods for helping people determine the credibility of the information they receive online. From recent events in the Arab world, Russia, and Mexico, for example, we know that people look to online media to receive information they can trust, while oppressive governments and drug cartels try to confuse them by spreading misinformation. Even in the US, the cost of misinformation is high; investors have lost millions from untrustworthy online information and little-known groups are trying to influence our elections by spreading lies. Being able to determine what information can be trusted has always been important and will be critical in the future.

It’s unlikely that Senator Coburn himself actually read thousands of NSF grant descriptions to determine which ones appear wasteful. Furthermore, such proposals are written for a scientific audience and require specific expertise to evaluate. And I am sure that the Senator does not believe that critical thinking education and technologies for supporting trust and credibility are “wasteful”. So how did this proposal end up in his report?


On the Senator’s “Wastebook” web page, there is a link next to a picture of Uncle Sam inviting readers to “Submit a tip about Government Waste”. By clicking on it, one can suggest examples of wasteful spending to the Senator. I wouldn’t be surprised if someone with only a cursory understanding of our proposal recommended it as wasteful. In this case — and perhaps in many others — a provider of online information has misled Senator Coburn. Therefore, this report itself is proof that determining the trustworthiness of what we read online is important.


The title may seem redundant. Of course if you  are going to predict, you should predict the future — what else, predict the past? But, when referring to social media data it may not be that redundant. In recent years there has been an increase of research on social media data predicting the future, predicting the present, and predicting the past using knowledge acquired in the future.

Why is predicting important? Predicting is equivalent to intelligence, with an important qualification: We admire the intelligence of someone who can predict what is going to happen, but only when they can explain why they are able to do so. If one (e.g., an octopus) is able to predict without explanation, we tend to downgrade it as coincidence.

Earlier today, the Pew Research Center on Journalism published an analysis entitled “Twitter and the Campaign“. They present a detailed study of millions of tweets and blogs, about what people say on social media about the candidates for the 2012 elections. (Not too many nice things, it turns out, except for Ron Paul, who, at the same time, is trailing on the polls.)

So, what does this mean for the predictive power of Twitter? Is he going to win because tweets have good things to say about him, or will he lose because tweets have good things to say about him? (Hint: The answer is “yes”.)

Shepard Fairey meets Angry Birds: Poster of our 2011 ICWSM submission "Limits of Electoral Predictions using Twitter"

Earlier this year, with my colleagues Eni Mustafaraj, Dani Gayo-Avello and student Catherine Lui we studied this question. Can one, analyzing social media data, predict the outcome of the US congressional elections? We did not find encouraging results, in neither the Google Trends data nor the Twitter data — thus the ingenious poster above that Dani designed.

When it comes to something so important as the elections, social media will be manipulated, because the stakes are too high. One should keep that in mind as we get closer to election time and “news articles” will start appearing arguing that someone will win or lose based on the number of friends or followers this candidate has. If the author gets it right, he will make sure to remind us in the future. If he gets it wrong, he will forget it first.

This does not mean that nothing can be predicted using social media. Movie sales can be predicted, as Bernando Huberman and his colleague showed. Flu outbreaks and periodic sales can be predicted, too. But not elections. At least without some sophisticated filtering that makes them as representative and competitive to the professional pollsters.



To My Twitter Editors: Thank you!


Thank you, my Twitter Editors!

I am writing this on Thanksgiving day, and it is only appropriate that I will start my blogging experience with a “Thank You”. It goes out to my Twitter editors. Who are they? The people I follow on Twitter. Some people call them “Twitter friends”, but I think “Twitter Editors” is more appropriate. Let me explain.

Twitter is a “microblogging” online service. It means you can write 140-character short notes that your “followers” will see — the people that want to be informed by what you write. And you read the notes of the people you follow. To distinguish between the two, researchers often refer to this second group as your Twitter “friends“.

The way I use Twitter, is by selecting carefully who to follow. I choose to follow people that (a) talk about issues I care, and (b) they do not talk much when they do not have something to say. I do not filter out those I disagree, because they often have interesting things to say. Finding good editors is not automatic, but it is worth the effort.

These days my research interests are mainly around the technical evolution of the Social Web, the propagation of information and misinformation in social and traditional media, graph algorithms and visualization, the state of the Computer Science  and Media Arts and Sciences education, and epistemology of knowledge (how do you know what you know). You can see details of my interests in my web page.

I want to be able to follow developments in all of the above areas, but that’s very difficult: It means that I would have to scan the news from numerous publications and sources hoping to find interesting articles that I should read. That requires a lot of time and effort on a daily basis. I can’t do it while keeping my sanity.

Here is where my Twitter Editors come in: They are also interested in some of the issues I am, and they tweet about them. They put effort into choosing the 140 characters of a tweet and they often provide a link to the original source. Very often, they point me to some piece of information that I would have missed. And I try to return the favor: I am also trying to be a good editor for those who follow me. I think it is a fair deal, one that increases the quality in the overflow of information we are experiencing.

Being Greek, I also care a lot about the current situation and dialogs in my native country. But I cannot do it with a single Twitter account, since my followers would see items written in a language they may not be able to read. Likely, they may not care, either. So, I have a second Twitter account for issues related to Greek culture and politics. Of course, I choose my Greek Twitter editors with the same criteria.

Twitter, unlike Facebook, allows you to separate your friends and followers, and this is what makes it possible to create your group of Editors. This is also the reason I do not use Facebook (though I have an account). The signal-to-noise ratio in Facebook is too low for me.

I am sure that there are many models on using Twitter, Facebook, Google+ and the other popular social network services, I am just describing the model that works for me. I highly recommend it. But it only works thanks to my Editors. Thank you!


