You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

~ Archive for Web Spam ~

Artificial Intelligence, your brain, and other things you cannot trust about politics

ø

A few days ago the Center for Research on Computation and Society organized a workshop with the provocative title “Six Reasons Fake News is the End of the World as we Know It“. I call it provocative because, whether “fake news” is a new thing or not, has been discussed a lot lately. Not all of us agree on what it is, or how novel it is. Some point out that it is as old as newspapers, others see it as something that mainly appeared last year. Yet others doubt that it is even a phenomenon worth discussing and that, instead of fake news, we should talk instead about specific categories such as false news, misinformation, disinformation, and propaganda.

Accepting the challenge, I gave a talk with an equally provocative, I would like to believe, title:  “Artificial Intelligence, your brain, and other things you cannot trust about politics“. You can follow my talk in the video below, but let me give you a list of the “things” that I discussed in the talk:

what-you-cannot-trusst-about-politics

I hope you find it interesting and do your own thinking about what we can trust when it comes to politics. Importantly, we need to figure out how to solve the problems of online misinformation and propaganda that seem to be all around us these days.

Or, to learn how to live with them, which is what I think will happen.

The Real “Fake News”

ø

The following is a blog post that Eni Mustafaraj has recently published in The Spoke. We reproduce it here with permission.

fake_news_post

Fake news has always been with us, starting with The Great Moon Hoax in 1835. What is different now is the existence of a mass medium, the Web, that allows anyone to financially benefit from it.

Etymologists typically track the change of a word’s meaning over decades, sometimes even over centuries. Currently, however, they find themselves observing a new president and his administration redefine words and phrases on a daily basis. Case in point: “fake news.” One would have to look hard to find an American who hasn’t heard this phrase in recent months. The president loves to apply it as a label to news organizations that he doesn’t agree with.

But right before its most recent incarnation, the phrase “fake news” had a different meaning. It referred to factually incorrect stories appearing on websites with names such as DenverGuardian.com or TrumpVision365.com that mushroomed in the weeks leading up to the 2016 U.S. Presidential Election. One such story—”FBI agent suspected in Hillary email leaks found dead in apparent murder-suicide”—was shared more than a half million times on Facebook, despite being entirely false. The website that published it, DenverGuardian.com, was operated by a man named Jestin Coler, who, when tracked down by persistent NPR reporters after the election, admitted to being a liberal who “enjoyed making a mess of the people that share the content”. He didn’t have any regrets.

Why did fake news flourish before the election? There are too many hypotheses to settle on a single explanation. Economists would explain it in terms of supply and demand. Initially, there were only a few such websites, but their creators noticed that sharing fake news stories on Facebook generated considerable pageviews (the number of visits on the page) for them. Their obvious conclusion: there was a demand for sensational political news from a sizeable portion of the web-browsing public. Because pageviews can be monetized by running Google ads alongside the fake stories, the response was swift: an industry of fake news websites grew quickly to supply fake content and feed the public’s demand. The creators of this content were scattered all over the world. As BuzzFeed reported, a cluster of more than 100 fake news websites was run by individuals in the remote town of Ceres, in the Former Yugoslav Republic of Macedonia.

How did the people in Macedonia manage to spread their fake stories on Facebook and earn thousands of dollars in the process? In addition to creating a cluster of fake news websites, they also created fake Facebook accounts that looked like real people and then had these accounts subscribe to real Facebook groups, such as “Hispanics for Trump” or “San Diego Berniecrats”, where conversations about the election were taking place. Every time the fake news websites published a new story, the fictitious accounts would share them in the Facebook groups they had joined. The real people in the groups would then start spreading the fake news article among their Facebook followers, successfully completing the misinformation cycle. These misinformation-spreading techniques were already known to researchers, but not to the public at large. My colleague Takis Metaxas and I discovered and documented one such technique used on Twitter all the way back in the 2010 Massachusetts Senate election between Martha Coakley and Scott Brown.

There is an important takeaway here for all of us: fake news doesn’t become dangerous because it’s created or because it is published; it becomes dangerous when members of the public decide that the news is worth spreading. The most ingenious part of spreading fake news is the step of “infiltrating” groups of people who are most susceptible to the story and will fall for it.  As explained in this news article, the Macedonians tried different political Facebook groups, before finally settling on pro-Trump supporters.

Once “fake news” entered Facebook’s ecosystem, it was easy for people who agreed with the story and were compelled by the clickbait nature of the headlines to spread it organically. Often these stories made it to the Facebook’s Trending News list. The top 20 fake news stories about the election received approximately 8.7 million views on Facebook, 1.4 million more views than the top 20 real news stories from 19 of the major news websites (CNN, New York Times, etc.), as an analysis by BuzzFeed News demonstrated. Facebook initially resisted the accusation that its platform had enabled fake news to flourish. However, after weeks of intense pressure from media and its user base, it introduced a series of changes to its interface to mitigate the impact of fake news. These include involving third-party fact-checkers to assign a “Disputed” label to posts with untrue claims, suppressing posts with such a label (making them less visible and less spreadable) and allowing users to flag stories as fake news.

It’s too early to assess the effect these changes will have on the sharing behavior of Facebook users. In the meantime, the fake news industry is targeting a new audience: the liberal voters. In March, the fake quote “It’s better for our budget if a cancer patient dies more quickly,” attributed to Tom Price, the Secretary of Health and Human Services, appeared on a website titled US Political News, operated by an individual in Kosovo. The story was shared over 80,000 times on Facebook.

Fake news has always been with us, starting with The Great Moon Hoax in 1835. What is different now is the existence of a mass medium, the Web, that allows anyone to monetize content through advertising. Since the cost of producing fake news is negligible, and the monetary rewards substantial, fake news is likely to persist. The journey that fake news takes only begins with its publication. We, the reading public who share these stories, triggered by headlines engineered to make us feel outraged or elated, are the ones who take the news on its journey. Let us all learn to resist such sharing impulses.

Defending your Domain Name

ø

I recently had the uninvited opportunity to defend my domain name, metaxa.net, and I am writing this post because it may be useful to others who find themselves in such a position. The good news: You do not need to be a lawyer to do it yourself and the arbitration system works reasonably well. The bad news: You need to do a bit of reading and writing.

I registered and own metaxa.net since 1999. Back then it was essential to own a domain name since, at that time, there were no services to upload and share your photos, no easy ways to have email addresses for you or your family members, and the only clouds around were still up in the sky unable to store any files you may need while away from your office.

As the Web services expanded to cover everything under the sun, the above uses became secondary, and I started utilizing the domain name for other reasons. In 2004 I had started a line of research to discover how Web Spammers succeeded in gaming search engines and place their bogus postings in the top-10 page of relevant search results. Back then it was thought that the “PageRank” algorithm was like “42”: the answer to everything (related to the Web). We now know that search engines can be gamed, and that they spend considerable resources to avoid Web Spam.

[Side note: My research led me to discover the reasons that search engines can be gamed —  pretty much for the same reasons we, humans, can be fooled. Web Spammers were using techniques very similar to the propagandistic techniques that politicians, advertisers and financial criminals are using to persuade us to vote for something, buy something or invest in something. I presented my work initially at AIRWeb 2005 (Web Spam, Progaganda and Trust) but if you are interested to read more you should check the journal version (Web Spam, Social Propaganda and the Evolution of Search Engine Rankings).]

Anyway, to implement a technique of discovering Web Spam sites I needed a method to evaluate similarity between Web site contents. It helped to have a Web site containing text that is unbiased towards a particular theme or product. Since there was nothing available online, I uploaded onto my own site a large collection of Associated Press news that were used by the TREC community. If you visited the top directory of metaxa.net you would be surprised to find huge files containing old news. But it was very unlikely that you would visit it. I did not include metaxa.net in any search engine listing, so it would never appear in your search results. Though I never planned to use metaxa.net as text repository, over the years it proved very useful. Many of my students used it to run their research projects, and after graduation some of them ended up working at Microsoft Bing and Google Search, fighting Web Spam on a regular basis.

I continued to own the domain name paying the fees on time, so it was a complete surprise when I received a letter in early April about a dispute filed with the World Intellectual Property Organization (WIPO) by Remy-Cointreau Luxemburg, the well known liqueur producing company. They wanted to take over my domain name! Remy-Cointreau had bought a well-known Greek Spirits company, METAXA, and over the years they had started buying every domain name that contained the string “metaxa”  (including “metaxaswineestate”!) Given my research, I knew exactly why they were doing that: They wanted to “persuade” the search engines that any search of the term “metaxa” should lead to their official site only! They knew how to fool PageRank and were doing it legally too. Now they wanted to fool WIPO’s arbitrators. In fact, as they claimed in the Complaint they served me,

Indeed, the term METAXA® is only known in relation to the Complainant. It has no meaning whatsoever in English or in any other language. A Google search on the term METAXA® displays several results, all of them being related to the Complainant

Wow. Three lies in three sentences. Anyone who knows just a bit of modern Greek history or checks Wikipedia knows that the name Metaxas is not that rare. (In Greek, the female version of a name, or a reference to the family name itself, does not include the final “s”.) One can Google translate metaxa to see that it means “raw silk” and “silk trader” in Greek, depending on the intonation. And the first page of Google search results does not constitute a proof of unique association. (Yet, even their own submitted screenshot included other METAXA references!)  And these were not the only “inaccurate claims” or logical fallacies in the Complaint. There were about a dozen of them. You can see a more detailed list (though not exhaustive) in my Response to their Complaint.

I was doubly stunned. Using such claims were ridiculing the WIPO, the body that they were asking to support them. How can they be so arrogant insulting WIPO’s arbitrators’ intelligence? Wouldn’t they expect that the Responder would point out their lies?

Probably not. My guess is that they expected that nobody would respond to their Complaint and they would win an uncontested case. You see, when a couple years ago we changed the domain record hosting company, the WHOIS information was not updated correctly, and it showed that the administrator was to be reached at … nocontactsfound@secureserver.net. Seems reasonable to expect that whoever owned the domain would not be reached, and so they could snatch it without contest. It would cost them a few thousand dollars, but for a company with deep pockets, that would not be a problem.

Unfortunately for them, due to a billing inquiry, I did get informed. Legalese never being part of my tongues, I turned to the wonderful Berkman community for advice. And the advice poured in immediately. Several Berkmanites, and primarily Prof. Jacques de Werra of the University of Geneva and Faculty Associate at the Berkman Center this academic year, suggested literature, gave me references to other relevant cases, recommended experts in the field, offered advice on my options. They even pointed out to other unfair activities that the company was involved in the past. In particular, a song written for the company’s ad campaign was stolen from Berkmanite musician Erin McKeown – you can read all about it at TechDirt Case study A Perspective On The Complexities Of Copyright And Creativity From A Victim Of Infringement.

Onto the technical part. It turns out that one can defeat a Complaint by convincing the arbitrators that any of these elements are not present:

  1. your domain name is identical or confusingly similar to a trademark or service mark in which the complainant has rights;
  2. you have no rights or legitimate interests in respect of the domain name; and
  3. your domain name has been registered and is being used in bad faith.

The detailed policy and rules (called a UDRP, Uniform Domain-Name Dispute-Resolution Policy) are listed here:
http://www.icann.org/en/help/dndr/udrp

In my case, the first element could not be countered: There were good reasons why my domain name was identical to theirs. But I could convince the arbitrators that the Complaint was wrong for both the second and the third element. That I have legitimate rights, was straight forward: I had to just point out the lies within their claims. But you need to provide a complete counter argument. It is not enough to point out a lie and expect that the arbitrators will go looking for its validity. You have to provide it yourself. With screenshots, excerpts, clear arguments. And you better be exhaustive in your arguments because you may only get one shot. For example, even though I was quite sure that I could nullify the second element, I should better also nullify the third one, just in case. For the third element, I had to go digging for references and receipts showing that I was always the owner of the account and had used it in good faith. Showing “good faith” was important, as this is a main reason why UDRPs exist: to curb the efforts of cyber-squatters who buy domain names only to sell them to the higher bidder of competing companies.

While I was at it, I wanted to point out that the Complaint itself was not filed in good faith. Remy-Cointreau really did not really have a case. Only through the dozen of lies in the filing they could put a case together. I would love to get the arbitrators acknowledge the Company’s filing with bad faith. There are no penalties associated with such an acknowledgement, but future arbitrators may take it into account in the future.

At the end the arbitrators denied the Remy Cointreau – Metaxa Complaint, stopping at the fact that the second element was not proven by the company. In addition to the company losing a case and a few thousand dollars, they lost the opportunity of persuading search engines about the unique association of their trademark. Now that this is an officially recorded WIPO UDRP case, it may help reducing the number of future frivolous Complaints.

PS. I recently found a good guide on How to Choose the Right Domain Name, I hope you will find it useful.

 

 

Log in