Misinformation and Propaganda in Cyberspace

Dear Readers,

The following is a blog that I wrote recently for a conference on “Truthiness in Digital Media” that is organized by the Berkman Center in March. It summarizes some of the research findings that have shaped my approach to the serious challenges that misinformation propagation poses in Cyberspace.

Do you have examples of misinformation or propaganda that you have seen on the Web or on Social Media? I would love to hear from you.

Takis Metaxas


Misinformation and Propaganda in Cyberspace

Since the early days of the discipline, Computer Scientists have always been interested in developing environments that exhibit well-understood and predictable behavior. If a computer system were to behave unpredictably, then we would look into the specifications and we would, in theory, be able to detect what went wrong, fix it, and move on. To this end, the World Wide Web, created by Tim Berners-Lee, was not expected to evolve into a system with unpredictable behavior. After all, the creation of WWW was enabled by three simple ideas: the introduction of the URL, a globally addressable system of files, the HTTP, a very simple communication protocol that allowed a computer to request and receive a file from another computer, and the HTML, a document-description language to simplify the development of documents that are easily readable by non-experts. Why, then, in a few years did we start to see the development of technical papers that included terms such as “propaganda” and “trust“?

Soon after its creation the Web began to grow exponentially because anyone could add to it. Anyone could be an author, without any guarantee of quality. The exponential growth of the Web necessitated the development of search engines (SEs) that gave us the opportunity to locate information fast. They grew so successful that they became the main providers of answers to any question one may have. It does not matter that several million documents may all contain the keywords we were including in our query, a good search engine will give us all the important ones in its top-10 results. We have developed a deep trust in these search results because we have so often found them to be valuable — or, when they are not, we might not notice it.

As SEs became popular and successful, Web spammers appeared. These are entities (people, organizations, businesses) who realized that they could exploit the trust that Web users placed in search engines. They would game the search engines manipulating the quality and relevance metrics so as to force their own content in the ever-important top-10 of a relevant search. The Search Engines noticed this and a battle with the web spammers ensued: For every good idea that search engines introduced to better index and retrieve web documents, the spammers would come up with a trick to exploit the new situation. When the SEs introduced keyword frequency for ranking, the spammers came up with keyword stuffing (lots of repeating keywords to give the impression of high relevance); for web site popularity, they responded with link farms (lots of interlinked sites belonging to the same spammer); in response to the descriptive nature of anchor text they detonated Google bombs (use irrelevant keywords as anchor text to target a web site); and for the famous PageRank, they introduced mutual admiration societies (collaborating spammers exchanging links to increase everyone’s PageRank). In fact, one can describe the evolution of search results ranking technology as a response to Web spamming tricks. And since for each spamming technique there is a corresponding propagandistic one, they became the propagandists of cyberspace.

Around 2004, the first elements of misinformation around elections started to appear, and political advisers recognized that, even though the Web was not a major component of electoral campaigns at the time, it would soon become one. If they could just “persuade” search engines to rank positive articles about their candidates highly, along with negative articles about their opponents, they could convince a few casual Web users that their message was more valid and get their votes. Elections in the US, after all, often depend in a small number of closely contested races.

Search Engines have certainly tried hard to limit the success of spammers, who are seen as exploiting this technology to achieve their goals. Search results were adjusted to be less easily spammable, even if this meant that some results were hand-picked rather than algorithmically produced. In fact, during the 2008 and the 2010 elections, searching  the Web for electoral candidates would yield results that contained official entries first: The candidate’s campaign sites, the office sites, and wikipedia entries topped the results, well above even well-respected news organizations. The embarrassment of being gamed and of the infamous “miserable failure” Google bomb would not be tolerated.

Around the same time we saw the development of the Social Web, networks that allow people connect, exchange ideas, air opinions, and keep up with their friends. The Social Web created opportunities both for spreading political (and other) messages, but also misinformation through spamming. In our research we have seen several examples of propagation of politically-motivated misinformation. During the important 2010 Special Senatorial election in MA, spammers used Twitter in order to create a Google bomb that would bring their own messages to the third position of the top-10 results by frequently repeating the same tweet. They also created the first Twitter bomb targeting individuals interested in the MASEN elections with misinformation about one of the candidates, and created a pre-fab Tweet factory imitating a grass-roots campaign, attacking news organizations and reporters (a technique known as “astroturfing“).

Like propaganda in society, spam will stay with us in cyberspace. And as we increasingly look to the Web for information, it is important that we are able to detect misinformation. Arguably, now is the most important time for such detection, since we do not currently have a system of trusted editors in cyberspace like that which has served us well in the past (newspapers, publishers, institutions). What can we do?

* Retweeting reveals communities of likely-minded people: There are 2 larger groups that naturally arise when one considers the retweeting patterns of those tweeting during the 2010 MA special election. Sampling reveals that the smaller contains liberals and the larger conservatives. The larger one appears to consist of 3 different subgroups.

Some promising research in social media has shown potential in using technology to detect astroturfing. In particular, the following rules hold true most (though not all) of the time:

  1. The credibility of the information you receive is related to the trust you have towards the original sender and to those who retweeted it.
  2. Not only do Twitter friends (those that you follow) reveal a similarly-minded community, their retweeting patterns make these communities stronger and more visible.
  3. While both truths and lies propagate in cyberspace, lies have shorter life-spans and are questioned more often.

While we might prefer an automatic way of detecting misinformation with the use of algorithms, this will not happen. Citizens of cyberspace must become educated about how to detect misinformation, be provided with tools that will help them question and verify information, and draw on the strengths of crowd sourcing through their own groups of trusted editors. This Berkman conference will help us push in this direction.


Leave a Comment

Log in