Research Replication in Social Computing


On the need for Research Replication in Social Computing

A call for replicating Social Computing results as a necessary step in the maturity of the field.

The field of Social Computing is a rather new one but it is one of the more active in Computer Science in the last few years. Many new conferences have been created to host the research efforts of computer scientists, social scientists, physicists, statisticians and many other researchers and practitioners. The excitement generated by the opportunities that opened through the relatively ease of retrieving large amounts of data has led many young researchers in diving to uncover the modes of social interactions.

At the risk of oversimplifying, one could say that the research papers we produce follow the general pattern of observational sciences:

  • We collect data that arguably can capture the phenomenon we want to study,
  • we may apply some sophisticated statistical tools, test a hypothesis applying machine learning tools, and
  • analyze the results.

Our conclusions sometimes do not just state the phenomenon we just observed, but they expand from the specific findings to claim possible projections that go beyond the observed.

One of the reasons that this approach seems familiar it that it resembles the one used in Experimental Computer Science. There, we measure the characteristics of the systems or algorithms we have built, and study their performance experimentally when exact analysis is not easy or even possible. This is a true and tried approach since, in the systems we build, we take great effort to avoid any behavior that is outside of the specifications. In the artificial worlds we create, we try to control all of its aspects, and this process has produced amazing technological results.

On the other hand, this approach may be inappropriate or incomplete compared to those used in Experimental Natural Sciences. Physicists, Biologists and Chemists would start with this approach to make initial sense of the data they are collecting, but this is just the beginning of the process. Replication of their research is normally needed to verify the validity of the original experiments. Sometimes the research results would not be validated, nevertheless, even in this case the replication process would provide insight into the workings of natural phenomena. Nature is mostly repeating its phenomena consistently, and one may have to account for all the parameters that affect them. Sometimes this is not easy, and replication offers the best guarantee that the research findings are valid.

As we mentioned, Social Computing is now being done by researchers coming from many disciplines, but it is different from both Computer Sciences and Natural Sciences. Though it has the potential of also becoming an experimental science, so far it is mostly an observational Science. This, it turns out, is a very important distinction. Society is different than Nature in several important ways. Its basic building blocks are people, not atoms, or chemical compounds or molecules. The complexity of their interactions is not easily tractable, to the degree that one may not be able to even enumerate all the factors that affect them. Moreover, people (and even social “bots” released in Social Media) do not behave consistently over time and under different conditions.

The closest relative to Social Computing is not Computer Science, we would argue, but Medical Science, where Natural Sciences phenomena are influenced by Social conditions. In both Medical and Natural Sciences, replication of results is considered an irreplaceable component of scientific progress. Any lab can make discoveries, but these discoveries are not considered valid until they have been independently replicated by other labs. Not surprisingly, replicating research findings is considered a critical publishing action, and researchers are getting credit for doing just that.

In Computer Science, replication has not been considered important and worth any credit, unless it reveals crucial flaws in the original research. It is unlikely, for example, that replicating Dijkstra’s Shortest Paths algorithm would contribute to the development of our discipline, and so it makes sense not to give credit to its replication. On the other hand, inability to replicate Hopcroft and Tarjan’s tri-connected component algorithm was a significant development, and Gutwender and Mutzel who discovered it and corrected it, did receive credit for it.

We acknowledge the need for replicating Social Computing research results, as a way of establishing the patterns that Social Media data are discovering under all meaningful conditions. We believe that such research replication will give credibility to the field. Failing to do that, we may end up collecting a large number of conflicting results that may end up discrediting the whole field.


Political retweets do mean endorsement


There are not too many results that Social Media research has discovered in the last few years that are as accurately reproducible as the title of this blog: Political RTs do mean endorsement. I have written a few things about the related research in my “Three Social Theorems” blog post a few weeks ago (this being the first theorem), and it was the theme of my talk during the “Truthiness in Digital Media” symposium at the Berkman Center.

This does not mean that every political RT is an endorsement. (If that were the case I could break it with a retweet right now). But it means that, when people retweet, that is, when they broadcast unedited to their own followers a tweet they received, most of the time they have read it, and thought that it is worth spreading. They practically endorse it.

If we realize the above, then we should not be very surprised by the spreading of the false news about Gov. Nikki R. Haley that the New York Times is reporting today “A Lie Races Across Twitter Before the Truth Can Boot Up“. While the reporter Jeremy Peters is impressed by the speed of the false news, detailing the path that it took (very good journalistic work, indeed), his most important point, I believe, is the one he makes in the second paragraph:

[…] it left news organizations facing a new round of questions about accountability and standards in the fast and loose “retweets do not imply endorsement” ethos of today’s political journalism.

Interestingly, it is mainly a few journalists that feel the need to explicitly mention in their personal profile description a disclaimer to the effect “My Retweets do not mean agreement”. In fact, out of more than 83,000 profile descriptions that my colleague Prof. Eni Mustafaraj  and I  have in our database of election-related tweets, we found only 53 that mention this disclaimer.  31 of them belong to journalists.

Should we expect more such lies to race across social media in the remaining months before the elections? Probably yes.
Should we expect journalists to be much more cautious the next time they retweet something from a source they do not trust? Certainly yes.

But the good news is that, lies, in general, have shorter, more questioned lives in Social Media. See Social Theorem 2 for the supporting research in this one.
Does it mean that no lie will ever be spread? Of course not.
But it means that most of the time they will be caught, especially as more people are aware of the RT Theorem and care about the truth.

Do all people care about the truth? Of course not. Take for example, Mr. Smith, the originator of the false blog post.



Μετά τις Εκλογές


Το παρακάτω κείμενο το έστειλα πρόσφατα για να δημοσιευθεί στις Απόψεις της εφημερίδας ΚΑΘΗΜΕΡΙΝΗ.


Οτι και να γίνει στις εκλογές, οι επόμενοι μήνες θα είναι δύσκολοι. Αλλά αυτό δεν πρέπει να χρησιμοποιηθεί σαν άλλοθι της βίας.

Σε λίγες μέρες θα κληθούμε να αποφασίσουμε τη σύνθεση της επόμενης Βουλής. Από τη σύνθεσή της θα εξαρτηθεί αν η επόμενη κυβέρνηση θα συνεχίσει να αποδέχεται το Μνημόνιο όπως έχει διαμορφωθεί, ή αν θα προσπαθήσει να το αλλάξει ή να σπάσει τη συμφωνία. Αυτή, πιθανότατα, θα είναι ανάμεσα στις μεγαλύτερες αποφάσεις που θα πρέπει να λάβει η νέα κυβέρνηση, μιας και θα οριοθετήσει το ρόλο την χώρας μας σχετικά με την Ευρώπη και τον κόσμο για πολλά χρόνια. Τέτοιες αποφάσεις δεν αλλάζουν χωρίς επώδυνα επακόλουθα. Αλλά εδώ δεν θέλω να κουβεντιάσω το ποια θα πρέπει να είναι η σωστή απόφαση, για δυο λόγους. Ο ένας είναι οτι δεν έχω να προσθέσω κάτι στα επιχειρήματα που έχουν ακουστεί μέχρι τώρα, υπέρ και κατά. Ο άλλος είναι οτι ο όρος “σωστή απόφαση” δεν είναι καλά ορισμένος. Είναι σίγουρο οτι, αργά ή γρήγορα, η Ελλάδα θα δει καλύτερες μέρες. Δεν υπάρχει λόγος να πιστέψουμε οτι θα είμαστε σε τόσο άσχημη κατάσταση μέχρι την συντέλεια του κόσμου. Κάποια στιγμή τα πράγματα θα καλυτερέψουν (και γι αυτό οι αναλύσεις για το Μνημόνιο, καλό είναι να αναφέρονται στο χρονικό διάστημα που θα απαιτηθεί για να καλυτερέψουν τα πράγματα, όχι στο αν θα καλυτερέψουν).

Θα ήθελα, λοιπόν, να αναφερθώ σε μια διάσταση που ίσως δεν έχει γίνει αρκετά κατανοητή ακόμα: Είτε παραμείνουμε στο Μνημόνιο, είτε φύγουμε από αυτό, στο εγγύς μέλλον η ποιότητα της ζωής δεν θα είναι καλύτερη, συγκριτικά με την ποιότητα της ζωής που είχαμε πριν το 2009. Είτε ζήσουμε με λιγότερα δανεικά, είτε με την αδυναμία δανεισμού, για ένα μεγάλο κομμάτι των πολιτών η ζωή θα είναι πολύ δύσκολη στα επόμενα δυο χρόνια. Σε αυτό, νομίζω, οτι όλοι συμφωνούμε. Δεν πρόκειται να γίνει κανένα θαύμα και να βρούμε ξαφνικά τους οικονομικούς πόρους που θα μας λύσουν τα οικονομικά και κοινωνικά προβλήματα γρήγορα και ανώδυνα.

Ότι και να αποφασίσει η επόμενη κυβέρνηση, λοιπόν, δεδομένου του διχασμού των απόψεων ως προς το Μνημόνιο, η μειοψηφούσα άποψη θα θεωρήσει οτι είχε δίκιο. “Η πλειοψηφούσα άποψη δεν έλυσε τα προβλήματα,” θα πουν οι υποστηρικτές της μειοψηφούσας άποψης, “άρα εμείς είχαμε δίκιο.” Περιέχει ένα μεγάλο λάθος λογικής αυτή η πρόταση, φυσικά, αλλά σε στιγμές κρίσης, η κριτική σκέψη χάνεται πρώτη.

Αυτό που με ανησυχεί είναι οτι η μειοψηφούσα άποψη δεν πρόκειται απλώς να μείνει στην παραπάνω λαθεμένη άποψη, αλλά μπορεί να θεωρήσει οτι η “δικαίωσή” της, της δίνει το δικαίωμα στην βίαιη διαμαρτυρία. Και, βέβαια, οι οπαδοί την πλειοψηφούσας άποψης θα θεωρήσουν οτι η νωπή λαϊκή ετυμηγορία τους δίνει το δικαίωμα στην βίαιη καταστολή. Φαύλος και επώδυνος κύκλος.

Τα πράγματα δεν είναι απαραίτητο να συμβούν όπως τα φοβάμαι. Μπορεί οι ψύχραιμοι από όλες τις απόψεις να καταφέρουν να μας συγκρατήσουν. Αλλά καλό είναι να αρχίσουν την προσπάθειά τους από τώρα. Και καλή μας τύχη.


Three Social Theorems


Dear Readers,

Below are my annotated notes from a talk I gave at Berkman’s Truthiness in Digital Media Symposium a few weeks ago. I introduced the concept of Social Theorems, as a way of formulating the findings of the research that is happening the last few years in the study of Social Media. It is my impression that, while we publish a lot of papers, write a lot of blogs and the journalists report often on this work, we have troubles communicating clearly our findings. I believe that we need both to clarify our findings (thus the Social Theorems), and to repeat experiments so that we know we have enough evidence on what we really find. I am working on a longer version of this blog and your feedback is welcome!

P. Takis Metaxas

With the development of the Social Web and the availability of data that are produced by humans, Scientists and Mathematicians have gotten an interest in studying issues traditionally interesting mainly to Social Scientists.

What we have also discovered is that Society is very different than Nature.

What do I mean by that? Natural phenomena are amenable to understanding using the scientific method and mathematical tools because they can be reproduced consistently every time. In the so-called STEM disciplines, we discover natural laws and mathematical theorems and keep building on our understanding of Nature. We can create hypotheses, design experiments and study their results, with the expectation that, when we repeat the experiments, the results will be substantially the same.

But when it comes to Social phenomena, we are far less clear about what tools and methods to use. We certainly use the ones we have used in Science, but they do not seem to produce the same concrete understanding that we enjoy with Nature. Humans may not always behave in the same, predictable ways and thus our experiments may not be easily reproducible.

What have we learned so far about Social phenomena from studying the data we collect in the Social Web? Below are three Social Theorems I have encountered in the research areas I am studying. I call them “Social Theorems” because, unlike mathematical Theorems, they are not expected to apply consistently in every situation; they apply most of the time and when enough attention has been paid by enough people. Proving Social Theorems involves providing enough evidence of their validity, along with description of their exceptions (situations that they do not apply). It is also important ti have a theory, an explanation, of why they are true. Disproving them involves showing that a significant number of counter examples exists. It is not enough to have a single counter example to disprove a social theorem, as people are able to create one just for fun. One has to show that at least a significant minority of all cases related to a Social Theorem are counter-examples.

SoThm 1. Re-tweets (unedited) about political issues indicate agreement, reveal communities of likely minded people.

SoThm 2. Given enough time and people’s attention, lies have short questioned lives.

SoThm 3. People with open minds and critical thinking abilities are better at figuring out truth than those without. (Technology can help in the process.)

So, what evidence do we have so far about the validity of these Social Theorems? Since this is simply a blog, I will try to outline the evidence with a couple of examples. I am currently working on a longer version of this blog, and your feedback is greatly appreciated.

Evidence for SoThm1.

There are a couple papers that present evidence that “Re-tweets (unedited) about political issues indicate agreement, reveal communities of likely minded people.” The first is the From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search paper that I co-authored with Eni Mustafaraj and presented at the WebScience 2010 conference. When we looked at the most active 200 Twitter users who were tweeting about the 2010 MA Special Senatorial election (those who sent at least 100 tweets in the week before the elections), we found that their re-tweets were revealing their political affiliations. First, we completely characterized them into liberals and conservatives based on their profiles and their tweets. Then we looked at how they were retweeting. In fact, 99% of the conservatives were only retweeting other conservatives’ messages and 96% of liberals those of other liberals’.

Then we looked at the retweeting patterns of the 1000 most active accounts (those sent at least 30 tweets in the week before the elections) and we discovered the graph below:

As you may have guessed, the liberals and conservatives are re-tweeting mostly the messages of their own folk. In addition, it makes sense: The act of re-tweeting has the effect of spreading a message to your own followers. If a liberal or conservative re-tweets (=repeats a message without modification), he/she wants this message to spread. In a politically charged climate, e.g., before some important elections, he/she will not be willing to spread a message that he disagrees with.

The second evidence comes from the paper “Political Polarization on Twitter” by Conover et. al. presented at the 2011 ICWSM conference. The retweeting pattern, shown below, indicates also a highly polarized environment.

In both cases, the pattern of user behavior is not applying 100% of the time, but it does apply most of the time. That is what makes this a Social Theorem.

Evidence for SoThm2.

The “Given enough time and people’s attention, lies have short questioned lives” Social Theorem describes a more interesting phenomenon because people tend to worry that lies somehow are much more powerful than truths. This worry stems mostly from our wish that no lie ever wins out, though we each know several lies that have survived. (For example, one could claim that there are several major religions in existence today that are propagating major lies.)

In our networked world, things are better, the evidence indicates. The next table comes from the “Twitter Under Crisis: Can we trust what we RT?” paper by Mendoza et. al., presented at the SOMA2010 Meeting. The authors examine some of the false and true rumors circulated after the Chilean earthquake in 2010. What they found is that rumors about confirmed truths had very few “denies” and were not questioned much during their propagation. On the other hand, those about confirmed false rumors were both questioned a lot and were denied much more often (see the last two columns enclosed in red rectangles). Why does this make sense? Large crowds are not easily fooled as the research on crowd sourcing has indicated.

Again, these findings do not claim that no lies will ever propagate, but that they will be confronted, questioned, and denied by others as they propagate. By comparison, truths will have a very different experience in their propagation.

The next evidence comes from the London Riots in August 2011. At the time, members of the UK government accused Twitter of spreading rumors and suggested it should be restricted in crises like these. The team that collected and studied the rumor propagation on Twitter found that this was not the case: False rumors were again, short-lived and often questioned during the riots. In a great interactive tool, the Guardian shows in detail the propagation of 7 such false rumors. I am reproducing below an image of one of them, the interested reader should take a closer look at the Guardian link.



During the Truthiness symposium, another case was presented, one that supposedly shows the flip side of this social theorem: That “misinformation has longer life, further spread on Twitter than accompanying corrections”. I copy the graph that supposedly shows that, for reference.

Does this mean that the Social Theorem is wrong? Recall that a Social Theorem cannot be refuted by a single counter-example, but by demonstrating that, at least a significant minority of counter examples, exists.

Further, the above example may not be as bad as it looks initially. First, note that the graph shows that the false spreading had a short life, it did not last more than a few hours. Moreover, note that the false rumor’s spreading was curbed as soon as the correction came out (see the red vertical line just before 7:30PM). This indicates that the correction probably had a significant effect in curbing the false information, as it might have continue to spread at the same rate as it did before.


Evidence for SoThm3.

I must admit that “People with open minds and critical thinking abilities are better at figuring out truth than those without” is a Social Theorem that I would like to be correct, I believe it to be correct, but I am not sure on how exactly to measure it. It makes sense: After all our educational systems since the Enlightenment is based on it. But how exactly do you created controlled experiments to prove or disprove it?

Here, Dear Reader, I ask for your suggestions.



Misinformation and Propaganda in Cyberspace


Dear Readers,

The following is a blog that I wrote recently for a conference on “Truthiness in Digital Media” that is organized by the Berkman Center in March. It summarizes some of the research findings that have shaped my approach to the serious challenges that misinformation propagation poses in Cyberspace.

Do you have examples of misinformation or propaganda that you have seen on the Web or on Social Media? I would love to hear from you.

Takis Metaxas


Misinformation and Propaganda in Cyberspace

Since the early days of the discipline, Computer Scientists have always been interested in developing environments that exhibit well-understood and predictable behavior. If a computer system were to behave unpredictably, then we would look into the specifications and we would, in theory, be able to detect what went wrong, fix it, and move on. To this end, the World Wide Web, created by Tim Berners-Lee, was not expected to evolve into a system with unpredictable behavior. After all, the creation of WWW was enabled by three simple ideas: the introduction of the URL, a globally addressable system of files, the HTTP, a very simple communication protocol that allowed a computer to request and receive a file from another computer, and the HTML, a document-description language to simplify the development of documents that are easily readable by non-experts. Why, then, in a few years did we start to see the development of technical papers that included terms such as “propaganda” and “trust“?

Soon after its creation the Web began to grow exponentially because anyone could add to it. Anyone could be an author, without any guarantee of quality. The exponential growth of the Web necessitated the development of search engines (SEs) that gave us the opportunity to locate information fast. They grew so successful that they became the main providers of answers to any question one may have. It does not matter that several million documents may all contain the keywords we were including in our query, a good search engine will give us all the important ones in its top-10 results. We have developed a deep trust in these search results because we have so often found them to be valuable — or, when they are not, we might not notice it.

As SEs became popular and successful, Web spammers appeared. These are entities (people, organizations, businesses) who realized that they could exploit the trust that Web users placed in search engines. They would game the search engines manipulating the quality and relevance metrics so as to force their own content in the ever-important top-10 of a relevant search. The Search Engines noticed this and a battle with the web spammers ensued: For every good idea that search engines introduced to better index and retrieve web documents, the spammers would come up with a trick to exploit the new situation. When the SEs introduced keyword frequency for ranking, the spammers came up with keyword stuffing (lots of repeating keywords to give the impression of high relevance); for web site popularity, they responded with link farms (lots of interlinked sites belonging to the same spammer); in response to the descriptive nature of anchor text they detonated Google bombs (use irrelevant keywords as anchor text to target a web site); and for the famous PageRank, they introduced mutual admiration societies (collaborating spammers exchanging links to increase everyone’s PageRank). In fact, one can describe the evolution of search results ranking technology as a response to Web spamming tricks. And since for each spamming technique there is a corresponding propagandistic one, they became the propagandists of cyberspace.

Around 2004, the first elements of misinformation around elections started to appear, and political advisers recognized that, even though the Web was not a major component of electoral campaigns at the time, it would soon become one. If they could just “persuade” search engines to rank positive articles about their candidates highly, along with negative articles about their opponents, they could convince a few casual Web users that their message was more valid and get their votes. Elections in the US, after all, often depend in a small number of closely contested races.

Search Engines have certainly tried hard to limit the success of spammers, who are seen as exploiting this technology to achieve their goals. Search results were adjusted to be less easily spammable, even if this meant that some results were hand-picked rather than algorithmically produced. In fact, during the 2008 and the 2010 elections, searching  the Web for electoral candidates would yield results that contained official entries first: The candidate’s campaign sites, the office sites, and wikipedia entries topped the results, well above even well-respected news organizations. The embarrassment of being gamed and of the infamous “miserable failure” Google bomb would not be tolerated.

Around the same time we saw the development of the Social Web, networks that allow people connect, exchange ideas, air opinions, and keep up with their friends. The Social Web created opportunities both for spreading political (and other) messages, but also misinformation through spamming. In our research we have seen several examples of propagation of politically-motivated misinformation. During the important 2010 Special Senatorial election in MA, spammers used Twitter in order to create a Google bomb that would bring their own messages to the third position of the top-10 results by frequently repeating the same tweet. They also created the first Twitter bomb targeting individuals interested in the MASEN elections with misinformation about one of the candidates, and created a pre-fab Tweet factory imitating a grass-roots campaign, attacking news organizations and reporters (a technique known as “astroturfing“).

Like propaganda in society, spam will stay with us in cyberspace. And as we increasingly look to the Web for information, it is important that we are able to detect misinformation. Arguably, now is the most important time for such detection, since we do not currently have a system of trusted editors in cyberspace like that which has served us well in the past (newspapers, publishers, institutions). What can we do?

* Retweeting reveals communities of likely-minded people: There are 2 larger groups that naturally arise when one considers the retweeting patterns of those tweeting during the 2010 MA special election. Sampling reveals that the smaller contains liberals and the larger conservatives. The larger one appears to consist of 3 different subgroups.

Some promising research in social media has shown potential in using technology to detect astroturfing. In particular, the following rules hold true most (though not all) of the time:

  1. The credibility of the information you receive is related to the trust you have towards the original sender and to those who retweeted it.
  2. Not only do Twitter friends (those that you follow) reveal a similarly-minded community, their retweeting patterns make these communities stronger and more visible.
  3. While both truths and lies propagate in cyberspace, lies have shorter life-spans and are questioned more often.

While we might prefer an automatic way of detecting misinformation with the use of algorithms, this will not happen. Citizens of cyberspace must become educated about how to detect misinformation, be provided with tools that will help them question and verify information, and draw on the strengths of crowd sourcing through their own groups of trusted editors. This Berkman conference will help us push in this direction.


Social Experiments: People vs Machines and In-lab vs Online


Social Experiments: People vs Machines

Recently, I attended a couple of talks on conducting social experiments. I found them both very interesting for different reasons, and thought of giving you an overview in this posting.

The first talk was at MIT. The Dertouzos lecture was established after the death of MIT’s LCS Director Michael Dertouzos who, even though he left us early, he left behind a great legacy. Given the strong interest of Dertouzos in the inter-disciplinary nature of Computer Science, the choice of Prof. Michael Kearns of UPenn was a particularly appropriate choice. Here is the abstract of Michael’s talk:

“What do the theory of computation, economics and related fields have to say about the emerging phenomena of crowd sourcing and social computing? Most successful applications of crowd sourcing to date have been on problems we might consider “embarrassingly parallelizable” from a computational perspective. But the power of the social computation approach is already evident, and the road cleared for applying it to more challenging problems. In part towards this goal, for a number of years we have been conducting controlled human-subject experiments in distributed social computation in networks with only limited and local communication. These experiments cast a number of traditional computational problems — including graph coloring, consensus, independent set, market equilibria, biased voting and network formation — as games of strategic interaction in which subjects have financial incentives to collectively “compute” global solutions. I will overview and summarize the many behavioral findings from this line of experimentation, and draw broad comparisons to some of the predictions made by the theory of computation and microeconomics.”

Michael is interested in exploring how well would people be able to effectively crowd source in the lab, when presented with a variety of problems, from the computationally easy to the hard. Graph coloring is a hard problem for a computer (i.e., for any parallel or sequential algorithm we know so far). How well would 36 undergraduate students solve instances of graph coloring? Quite well, it turns out. See the video clip.

Finding consensus (e.g., having all nodes in a graph choose the same color) is an easy problem to solve by both sequential and parallel algorithms. Yet, when presented with a time limit, humans have troubles reaching consensus as they are not able to come up consistently with a successful strategy: some will change colors often, trying to accommodate their neighbors; others will stick stubbornly to their color expecting other to follow them, yet others will flip-flop a lot giving up at the wrong moment, etc. Experience does not seem to help: Playing this game over and over, seems to be teaching them little. See this video clip of 36 undergraduates finding consensus of a graph composed of highly interconnected tribes.

The two video clips I recorded on my iPad during his talk are only a small teaser of the work Michael Kearns presented. If you are interested, you should take a closer look at his published papers.

Social experiments in the lab vs online

The second talk was at the Berkman Center for Internet and Society. Fellow Jerome Hergueux’s talk was entitled “The Promises of Web-based Social Experiments.” He is interested in exploring how closely the results of experiments conducted online match those conducted in the lab. Here is the abstract of his talk:

“The advent of the internet provides social scientists with a fantastic tool for conducting behavioral experiments online at a very large-scale and at an affordable cost. It is surprising, however, how little research has leveraged the affordances of the internet to set up such social experiments so far.  In this talk, Jerome Hergueux will introduce the audience to one of the first online platforms specifically designed for conducting interactive social experiments over the internet to date. He will present the preliminary results of a randomized experiment that compares behavioral measures of social preferences obtained both in a traditional University laboratory and online, with a focus on engaging the audience in a reflection about the specificities, limitations and promises of online experimental economics as a tool for social science research”

Jerome and his colleagues at the University of Paris tried to re-create online as close as possible the environment of the labs that social scientists have used for a long time. They recruited subjects from the very same pool, and asked some of them to participate in experiments in a lab setting, while others were to participate in the very same experiments online. There were no interactions between the participants, though the ones in the lab would see who else had come for the experiments. What they found was that the results of the experiments differ! In particular they found that the online subjects seem to be significantly more social than those in the Lab: More altruistic, showing higher trust, and being less risk averse. While this is still preliminary work, it seems quite promising in giving us a better understanding on the transformation we undergo when we go online. You can watch the full talk of Jerome Hergueux from the Berkman’s site.

We still have a lot to learn about conducting social experiments, but these two talks are definitely helping in this direction.


Election time, and the predicting is easy…


Election time, and the predicting is easy…

As I am sure you have heard, the Iowa caucus results are in. Several journalists are reporting on the elections along with claims of “predictions” that social media are supposedly making. And the day after the Iowa caucus, they are wondering whether Twitter predicted correctly or not. And they look at the “professionals” for advise such as Globalpoint, Sociagility, Socialbackers and other impressive sounding companies.

Shepard Fairey meets Angry Birds: Poster of our 2011 ICWSM submission "Limits of Electoral Predictions using Twitter"

Well, Twitter did not get it right. That is not surprising to my co-authors and I.  Yet, they try to find a silver lining, by claiming smaller predictions such as “anticipating Santorum’s excellent performance than the national polls accomplished.” Of course, the fact that Twitter missed the mismatches with the other 5 candidates is ignored. Why can’t they see that?

A few years ago I had created a questionnaire to help my students sharpen their critical thinking skills. One question that the vast majority got right was the following: “Is Microsoft the most creative tech company?” If one were to do a Web search on this question, the first hit (the “I feel lucky” button) would be Microsoft’s own Web page, because it had as title “Microsoft is the most creative tech company.” My students realized that Microsoft may not be providing an unbiased answer to this question, and ignored it.

It is exactly this critical thinking principle that journalists obsessed with election predictions are getting wrong: The companies I mentioned above ( Globalpoint, Sociagility, Socialbackers ) are all in the business of making money by promising magical abilities in their own predictions and metrics. One should not take their claims on face value because they have financial conflict of interest in giving misleading answers (e.g. “Comparing our study data with polling data from respected independent US political polling firm Public Policy Polling, we discovered a strong, positive correlation between social media performance and voting intention in the Iowa caucus.” Note that even after the elections they talk about intentions, not results.)

That’s not the only example violating this basic critical thinking principle I saw today. Earlier, I had received a tweet that “Americans more susceptible to online scams than believed, study finds“. The article reports that older, rich, highly educated men from the Midwest, politically affiliated with the Green Party are far less susceptible to scam than young, poor, high school dropout women from the Southwest that are supporting Independents. If you read the “study” findings, you will be even more confused about the quality of this study. A closer look reveals that the “study” was done by PC Tools, a company selling “online security and system utility software.” Apparently, neither the vagueness of the “survey” nor the financial conflict of interest of the surveying company raised any flags for the reporter.

In the Web era, information is finding us, not the other way around. Being able to think critically will be crucial.



Submit to KI Journal on Social Media


The deadline for the Special Issue on Social Media for the German Journal of Artificial Intelligence (Kuenstliche Intelligenz) is coming up. We invite you to submit your article by the FEBRUARY 6, 2012 deadline. It will be published later this year by Springer.

If you intent to submit but the deadline is too close to catch, send me an email!

The Theme of the Special Issue

Social Media has led to radical paradigm shifts in the ways we communicate, collaborate, consume,  and create information. Technology allows virtually anyone to disseminate information to a global  audience almost instantaneously. Information published by peers in the form of Tweets, blog posts,  or Web documents through online social networking services has proliferated on an unprecedented  scale, contributing to an exponentially growing data deluge. A new level of connectedness among  peers adds new ways for the consumption of (traditional) media. We are witnessing new forms of  collaboration, including the phenomenon of an emergent ‘collective intelligence’. This intelligence of  crowds can be harnessed in myriad ways, ranging from outsourcing simple, repetitive tasks on  Amazon Mechanical Turk, to solving complex challenges such as proving a mathematical theorem  creatively and collaboratively.

This call for papers welcomes contributions showing:

  1. How to make sense of Social Media data, i.e. how to condense, distill, or integrate highly  decentralized and dispersed data resulting from human communication, including sensor-­‐collected  data to a meaningful entity or information service, or
  2. How Social Media contributes to innovation, collaboration, and collective intelligence.

We invite papers covering all aspects of Social Media analysis including Social Media in Business  (especially for Marketing, Innovation, and Collaboration), Entertainment (especially Social News,  Social Music Services, Social TV, and Social Network Games), as well as Art (e.g. City Installations).  Applications of Social Media in art may be understood as a playing field for translating highly  decentralized ‘social data’ into centralized forms of artful expression, thus furthering our intuitive  understanding of these complex emergent phenomena.

List of topics

The list of topics mentioned below is neither exhaustive nor exclusive. Insightful artifacts and  methods as well as analytical, conceptual, empirical, and theoretical approaches (using any kind of  research method, including experiments, primary data from social media logs, case studies,  simulations, surveys, and so on) are within the scope. Practical project descriptions and innovative  software are also of high interest to the readers of KI.

  •  Information/Web mining (e.g. opinion mining)
  •  Prognosis (e.g. trend and hot topic identification)
  •  Collective Intelligence
  •  Crowd sourcing
  •  Swarm Creativity, Collaborative Innovation Networks
  •  (Dynamic) Social Media Monitoring
  •  Sentiment, Natural Language Processing
  •  Social Media within and for Smart Cities, Smart Traffic, Smart Energy
  •  Social Networks for the collaboration of large communities
  •  User behavior, social interaction
  •  Social Network Analysis (SNA), semantic network analysis
  •  Social search engines and aggregators
  •  Social network games
  •  Personalization and adaptation to user preference
  •  Trust, reputation, social control, privacy
  •  Information reliability, Web spam, content authenticity (e.g., detecting “astroturfing”)


  • Submissions open until February 6, 2012 (extended)
  • Camera-­‐ready copies of revised papers by April 30, 2012
  • Pre-­‐Publication of accepted papers via Springer Online First in June 2012
  • Printed version of this Special Issue: Fall 2012

In addition to complete research papers, this Special Issue will accept project and dissertation reports  as well as discussion and conference reports in order to provide a comprehensive overview of the  current activities in this area.

 Guest Editors    

  • Detlef Schoder, Prof. Dr.,  schoder at wim.uni-­,
    University of Cologne (Koeln),  Department of Information Systems and Information Management, Cologne, Germany
  • Peter A. Gloor, PhD,  pgloor at,
    MIT Sloan School of Management, Center for  Collective Intelligence, Cambridge, MA, USA
  • Panagiotis Takis Metaxas, PhD, Prof.,  pmetaxas at,
    Wellesley College,  Department of Computer Science, Wellesley, MA, and Harvard University, Center for  Research on Computation and Society, Cambridge, MA, USA

For inquiries and submissions please contact:

Prof. Dr. Detlef Schoder
University of Cologne (Koeln),
Department of Information Systems and  Information Management,
Pohligstrasse 1, D-­‐50969
Phone: +49 / (0)221 470-­5325,
Fax: +49 / (0)221 470-­5393,
URL: http://www.wim.uni-­,
Email:  schoder at wim.uni-­


Determining the trustworthiness of what we read online is important.


Yesterday I was informed that Senator Tom Coburn published a report entitled “Wastebook, A Guide to Some of the Most Wasteful and Low Priority Government Spending of 2011”, which included my NSF grant “Trails of Trustworthiness: Understanding and Supporting Quality of Information in Real-Time Streams” as example number 34.

I was not familiar with Senator Coburn’s publication and was surprised and curious to see what would characterize my project as “wasteful”. My colleagues and I have been working on the problem of information reliability over the last 5 years and we have published more than a dozen papers in refereed conferences and journals, three of which have received the “Best Paper” distinction. What was it that Senator Coburn found so unacceptable that reviewers and scientific audiences overlooked? After reading the relevant sections of his publication, I was even more confused as to what the Senator deemed objectionable. Everything he mentions regarding my project seems positive:

Do you trust your twitter feed? The National Science Foundation is providing $492,055 in taxpayer dollars to researchers at Wellesley College to answer that question.    Researchers cite “the tremendous growth of the so-called Social Web” as a factor that will “put new stress to human abilities to act under time pressure in making decisions and determine the quality of information received.” Their work will analyze the “trails of trustworthiness” that people leave on information channels like Twitter. They will study how users mark certain messages as trustworthy and how they interact with others whose “trust values” are known.    The NSF grant also includes funding for an online course to study “what critical thinking means in our highly interconnected world,” in which we might be “interacting regularly with people we may never meet”.

However, the proposal is condescendingly titled,  “To Trust or Not to Trust Tweets, That is the Question.” This suggests that the author of the report may think that trust in online communication is not worth studying, or that Twitter is unworthy to be mentioned in a scientific proposal. But to those who have actually read the details of the proposal, this is a superficial criticism. What we are proposing to do is to create semi-automatic methods for helping people determine the credibility of the information they receive online. From recent events in the Arab world, Russia, and Mexico, for example, we know that people look to online media to receive information they can trust, while oppressive governments and drug cartels try to confuse them by spreading misinformation. Even in the US, the cost of misinformation is high; investors have lost millions from untrustworthy online information and little-known groups are trying to influence our elections by spreading lies. Being able to determine what information can be trusted has always been important and will be critical in the future.

It’s unlikely that Senator Coburn himself actually read thousands of NSF grant descriptions to determine which ones appear wasteful. Furthermore, such proposals are written for a scientific audience and require specific expertise to evaluate. And I am sure that the Senator does not believe that critical thinking education and technologies for supporting trust and credibility are “wasteful”. So how did this proposal end up in his report?


On the Senator’s “Wastebook” web page, there is a link next to a picture of Uncle Sam inviting readers to “Submit a tip about Government Waste”. By clicking on it, one can suggest examples of wasteful spending to the Senator. I wouldn’t be surprised if someone with only a cursory understanding of our proposal recommended it as wasteful. In this case — and perhaps in many others — a provider of online information has misled Senator Coburn. Therefore, this report itself is proof that determining the trustworthiness of what we read online is important.


Predict the Future (and Tell the World about it!)


In my previous posting (Predict the Future!) I was arguing for the benefits and risks of making predictions using data gathered from Social media. I will take this opportunity to mention a Call-For-Papers that I am involved in. The online journal “Internet Research”, famous for having published the original article by Tim Berners-Lee on the creation of the WWW, is having a special issue on “The Power of Prediction with Social Media” to be published in 2012. Below are the details. If you have any questions, please do contact me or any of the other guest editors.


Special issue call for papers on
“The Power of Prediction with Social Media”
from Internet Research, ISSN: 1066-2243

Editor in Chief: Jim Jansen


Social media today provide an impressive amount of data about users and their societal interactions, thereby offering computer scientists, social scientists, economists, and statisticians many new opportunities for research exploration. Arguably one of the most interesting lines of work is that of forecasting future events and developments based on social media data, as we have recently seen in the areas of politics, finance, entertainment, market demands, health, etc.

But what can successfully be predicted and why? Since the first algorithms and techniques emerged rather recently, little is known about their overall potential, limitations and general applicability to different domains.

Better understanding the predictive power and limitations of social media is therefore of utmost importance, in order to –for example– avoid false expectations, misinformation or unintended consequences. Today, current methods and techniques are far from being well understood, and it is mostly unclear to what extent or under what conditions the different methods for prediction can be applied to social media. While there exists a respectable and growing amount of literature in this area, current work is fragmented, characterized by a lack of common evaluation approaches. Yet, this research seems to have reached a sufficient level of interest and relevance to justify a dedicated special issue.

This special issue aims to shape a vision of important questions to be addressed in this field and fill the gaps in current research by soliciting presentations of early research on algorithms, techniques, methods and empirical studies aimed at the prediction of future or present events based on user generated content in social media.


To address this guiding theme the special issue will be articulated around, but not limited to, the following topics:

  1. Politics, branding, and public opinion mining (e.g., electoral, market or stock market prediction).
  2. Health, mood, and threats (e.g., epidemic outbreaks, social movements).
  3. Methodological aspects (e.g., data collection, data sampling, privacy and data de-identification).
  4. Success and failure case studies (e.g., reproducibility of previous research or selection of baselines).


  • Manuscript due date: June 1, 2012
  • Decisions due: August 1, 2012
  • Revised paper due: September 15, 2012
  • Notification of acceptance: October 1, 2012
  • Submission of final manuscript: October 31, 2012
  • Publication date: late 2012 / early 2013 (tentative)


All submitted manuscripts should be original contributions and not be under consideration in any other venue.

Publication of an enhanced version of a previously published conference paper is possible if the review process determines that the revision contains significant enhancements, amplification or clarification of the original material. Any prior appearance of a substantial amount of a submission should be noted in the submission letter and on the title page.

Submissions must adhere to the “Author Guidelines

Detailed instructions will be announced later this year.

Guest editors


Log in