Fellowship Opportunity: Media Cloud Project Fellow

The Berkman Center for Internet & Society at Harvard University and the MIT Center for Civic Media seek a fellow to join Media Cloud, a project led by Yochai Benkler and Ethan Zuckerman and driven by an interdisciplinary team of staff and researchers. The fellow will lead the production of scholarly papers and outputs, advance specific research threads, and contribute to the development of research methods and tools.

Media Cloud is an open source, open data platform that allows researchers to answer complex quantitative and qualitative questions about the content of online media and helps to support a novel, data-driven perspective on the dynamics of online conversations.

This fellowship offers an early-stage researcher in social science, data science, or Internet & society to explore the state of digital media through analysis of networks, big data, social media, mobilization, and attention online. This is a unique opportunity to study the networked public sphere and its impact on current political debates.

A full position description for the job can be found below and on the Harvard Human Resources website.

Please note that applications for this fellowship must be submitted through the Harvard Human Resources website, and will not be collected or coordinated directly through the Berkman Center.  Apply for the Media Cloud fellowship here.

Key Responsibilities Include

  • leading a research effort to study online debates related to Sexual and Reproductive Health and Rights (SRHR), investigating the framing of terms around sexuality, reproductive health and rights, as well as training a small subset of advocates working on these issues on the use of our tools;
  • working closely with project members from the Berkman Center and Center for Civic Media to coordinate and conduct research, writing, project outputs, and preparing publications;
  • documenting processes and strategies for using Media Cloud;
  • cultivating and supporting relationships among faculty and other experts in media, social science, law, computer science, journalism, data visualization, and other fields to understand substantive issues and objectives, assess needs and capabilities, and collaboratively develop research methods to meet the project’s broader goals;
  • contributing to the development of the research and technical platforms;
  • planning, communicating, and implementing Media Cloud workshops and convenings;
  • managing the selection, oversight, and mentorship of student researchers;
  • developing plans and timelines to advance project priorities and meet deadlines; and
  • providing additional project support as needed.

The fellowship is positioned to share time between the Berkman Center and the Center for Civic Media at MIT, working closely with researchers, staff, and faculty at both institutions. The community of fellows and researchers at the two Centers includes a wide range of people working on issues related to Internet and society, including scholars, practitioners, innovators, and others committed to understanding and advancing the public interest.

Basic Qualifications

Bachelor’s degree required.

Additional Qualifications

Candidate should be energetic and passionate about working on issues related to social mobilization and network analysis. Superior writing and verbal skills, sound judgment, exceptional ethical standards, and proven abilities in interpersonal communication, supervision, project management, and team building are required. The fellow will have heart, verve, and vigor; a can-do attitude; a very good sense of humor; and a strong desire to affect change in the world.

Demonstrable knowledge of statistical and content analysis, text analysis and natural language processing, visualization, and the ability to produce large social science research outputs is preferred. Familiarity with the intersections of policy with technology, regulation, and social impact is beneficial.

An advanced degree in a related field is strongly preferred. Experience with technical, substantive, and organizational work for non-governmental or academic organizations is useful, in addition to experience in managing and guiding participating researchers and collaborators.

Additional Information

The Centers encourage and support inviting and rigorous intellectual environments. The right candidate will thrive in a committed, collaborative, and tight-knit community that encourages creativity and humor, supports deep inquiry, values novel approaches to solving problems, strives for transparency, continually builds upon best-practices and lessons learned, and supports its community members’ independent and collective goals.

In order to most fully and efficiently carry out her duties, the candidate will attend workshops and conferences, and will have frequent opportunities to expand his/her knowledge.

The annual salary is $48,000, and is eligible for health benefits.

The position is administratively housed at the Berkman Center, and will report to the Research Director at the Berkman Center.

As with all Berkman Center positions, this is a term appointment expected to continue through 12/31/14, subject to department need and funding, with strong potential for continuation based on funding and institutional need.

About Media Cloud

Using Media Cloud, academic researchers, journalism critics, and interested citizens can examine what media sources cover which stories, what language different media outlets use in conjunction with different stories, and how stories spread from one media outlet to another. Media Cloud was the research tool used to develop the recent publication, “Social Mobilization and the Networked Public Sphere: Mapping the SOPA-PIPA Debate.”  For more information about Media Cloud, visit: http://mediacloud.org.

About the Berkman Center for Internet & Society

The Berkman Center for Internet & Society at Harvard University is a research program founded to explore cyberspace, share in its study, and help pioneer its development. Founded in 1997, through a generous gift from Jack N. and Lillian R. Berkman, the Center is home to an ever-growing community of faculty, fellows, staff, and affiliates working on projects that span the broad range of intersections between cyberspace, technology, and society. More information can be found at http://cyber.law.harvard.edu.

About the MIT Center for Civic Media

The MIT Center for Civic Media works hand in hand with diverse communities to collaboratively create, design, deploy, and assess civic media tools and practices. We are inventors of new technologies that support and foster civic media and political action, we are a hub for the study of these technologies, and we coordinate community-based design processes locally in the Boston area, across the United States, and around the world. Bridging two established programs at MIT—one known for inventing alternate technical futures, the other for identifying the cultural and social potential of media change—the Center for Civic Media is a joint effort between the MIT Media Lab and the MIT Comparative Media Studies Program. It is made possible by funding from the Knight Foundation.  More information can be found at http://civic.mit.edu/.

Commitment to Diversity

The work and well-being of the Berkman Center for Internet & Society at Harvard University are strengthened profoundly by the diversity of our network and our differences in background, culture, experience, national origin, religion, sexual orientation, and much more. We actively seek and welcome applications from people of color, women, the LGBTQIA community, and persons with disabilities, as well as applications from researchers and practitioners from across the spectrum of disciplines and methods.

If the Media Cloud fellowship is not for you, but you’re interested in Berkman’s fellowship program, please read more on our fellowship page.  We are currently accepting applications for 2014-2015 fellowships through our open call, through which applicants propose their own course of study.  Applications through the open call will be accepted through December 8, 2013.

Posted in Uncategorized | Leave a comment

New Publication: “Social Mobilization and the Networked Public Sphere: Mapping the SOPA-PIPA Debate”

The Media Cloud team is pleased to announce the release of a new paper, Social Mobilization and the Networked Public Sphere: Mapping the SOPA-PIPA Debate, along with a set of interactive maps that cover selected weeks in the COICA-SOPA-PIPA controversy.

In this paper, we use a new set of online research tools to develop a detailed study of the public debate over proposed legislation in the United States that was designed to give prosecutors and copyright holders new tools to pursue suspected online copyright violations. Our study applies a mixed-methods approach by combining text and link analysis with human coding and informal interviews to map the evolution of the controversy over time and to analyze the mobilization, roles, and interactions of various actors.

This novel, data-driven perspective on the dynamics of the networked public sphere supports an optimistic view of the potential for networked democratic participation, and offers a view of a vibrant, diverse, and decentralized networked public sphere that exhibited broad participation, leveraged topical expertise, and focused public sentiment to shape national public policy.

We also offer an interactive visualization that maps the evolution of a public controversy by collecting time slices of thousands of sources, then using link analysis to assess the progress of the debate over time. We used the Media Cloud platform to depict media sources (“nodes”, which appear as circles on the map with different colors denoting different media types). This visualization tracks media sources and their linkages within discrete time slices and allows users to zoom into the controversy to see which entities are present in the debate during a given period as well as who is linking to whom at any point in time.

The authors wish to thank the Ford Foundation and the Open Society Foundation for their generous support of this research and of the development of the Media Cloud platform.

About Media Cloud

Media Cloud, a joint project of the Berkman Center for Internet & Society at Harvard University and the Center for Civic Media at MIT, is an open source, open data platform that allows researchers to answer complex quantitative and qualitative questions about the content of online media. Using Media Cloud, academic researchers, journalism critics, and interested citizens can examine what media sources cover which stories, what language different media outlets use in conjunction with different stories, and how stories spread from one media outlet to another. We encourage interested readers to explore Media Cloud.


Posted in Uncategorized | Leave a comment

Media Cloud 2.0 Pre-alpha release 00.00.05.

We have released a pre-alpha version of Media Cloud 2.0. The changes since our initial release in 2009 are too numerous to list. Perhaps most noticeably, the install process has been vastly improved. Many users will be able to install Media Cloud simply by running a single script instead of having to manually configure and install numerous Perl modules.

A source distribution of this release is available on Sourceforge at the following location:


We have also put together an Ubuntu virtual machine image with Media Cloud already install. It’s available here:


This is still alpha software but we hope this release provides an easier and more stable alternative to installing Media Cloud from source control.

Posted in Uncategorized | Leave a comment

HTML::TextCruft CPAN module released

We have extracted a piece of the Media Cloud code base and released it as HTML::TextCruft — a stand alone CPAN module. HTML::TextCruft is the first part of the code to extract article text from HTML and remove ads, navigation, and other cruft.

Media Cloud has always been free and open source but since it is a large code base not everyone is able to install it. By releasing this piece as a separate module, we hope that its functionality will be more accessible to the wider community.

More information on HTML::TextCruft is available on its CPAN page.

Posted in Uncategorized | Leave a comment

Media Cloud is Participating in Google Summer of Code 2012

Media Cloud is excited to be participating in Google Summer of Code this year through the Berkman Center for Internet & Society. Google Summer of Code (GSoC) is a global program in which Google offers students stipends to work on Open Source projects. Media Cloud received valuable contributions from our students when we participated in 2009 and 2010 and we’re looking forward to this year’s program.

For students who are interested in working on Media Cloud through Google Summer of Code, we have put together a list of possible Media Cloud projects here. There is also a Berkman Wiki listing Berkman specific GSoC requirements as well a number of other interesting Berkman projects also participating in GSoC. Finally, the GSoC homepage contains detailed information about GSoC policies and eligibility requirements.

The final application deadline is April 6 at 19:00 UTC but early applications are preferred.

Posted in Uncategorized | 2 Comments

Russian Media for the Week of 6/27/2011 – 7/03/2011

Russian media this week has seen the emergence of a number of prominent stories, including themes related to Russia’s budget and banking system, political appointments, energy politics, Russia’s relations with neighboring countries, bills being debated by the Duma, and concerns over forest fires in the country’s far east.

Week of June 20 – June 26 (Red) Compared to June 13 – June 19 (Blue) for Five Major Russian Media Segments (TV, Pop Blogs, Random Blogs, Mainstream Media, Government):

New issues related to domestic politics and finance seem to dominate the overall week-to-week comparison cloud, indicated by the emergence of new high frequency words (in red) such as “банк” (bank), “бюджет” (budget), “газа” (gas), and “национальной” (national).  The frequent discussion of banks this week is in part accounted for by the catastrophic failure and subsequent bailout of the Bank of Moscow, Russia’s fifth largest bank.  In what is reputed to be “the largest bailout in modern Russian history,” the bank will receive as much as $14 billion in state-backed loans, with the state-run VTB Bank increasing its stake in the company to 75%.

The Russian budget and budgetary constraints were also an important theme in this week’s news.  On Wednesday, 6/29, President Dmitry Medvedev delivered an address to the Duma laying out his three year budget guidelines for the 2012-2014 period.  Focusing on governance efficiency, modernization, competitiveness, long term development, and living standards, the President laid out 12 vital areas of budget policy that will be central to achieving national economic goals in the coming years.  In addition to his ongoing emphasis on modernization, Medvedev stressed the need for economic decentralization, with development occurring on a regional level and not just in and around the capital cities.  Budgets were also discussed in several other contexts this week, helping to account for the appearance of “бюджет” in the week’s overall word cloud.  Prime Minister Vladimir Putin made headlines for drawing attention to the need to ensure the new budget would be deficit-free.  New stories also discussed the protests in Greece related to that country’s budget debate and the possible implications for Russian oil revenue.  Nezavisimaya Gazeta reported on a new study that shows Russians on average spend 30% of their household budget on food, with poorer families spending as much as 50% of their income.  The discussion of Medvedev’s budgetary plan and related topics clearly dominated the Government media segment for the week.  New high frequency words there such as “развивать,” (develop), “реализации” (implementation), “региональных” (regional), “современные” (modern), “экономики” (economy), “экономической” (economic), and “технического” (technical) indicate the frequent discussion of some of the main components of Medvedev’s plan.

Week of June 27 – July 3 (Red) Compared to June 20 – June 26 (Blue) for Russian Government:

A couple of last week’s major stories continued to attract attention this week, with related terms showing up in purple in the week-to-week comparison cloud.  These include, for example, the nomination of St. Petersburg Governor Valentina Matvienko to become Speaker of the Federation Council.  With the approval of Medvedev and Putin, this week Matvienko agreed to accept the new position.  Opposition formed in Saint Petersburg, with young Yabloko party members protesting in the street on Wednesday and the formation of an opposition bloc entitled “St. Petersburg against Matvienko.”  As the city’s governor since 2003, Matvienko had become increasingly unpopular.  Resented by local residents for her government’s failure to clear the streets of snow and ice in the winter, many have speculated that Matvienko’s move was part of an effort to buoy support for the United Russia party in preparation for the upcoming Duma elections this December.  This story’s continued prominence is indicated by the frequency of words such as “петербург” (Petersburg), “федерации” ([of the] federation), and “совет” (council) in the week’s overall cloud.  Drilling down into specific media segments, the attention garnered by Matvienko’s high profile move becomes even more apparent, with her name (“Матвиенко”) and the word “губернатор” (governor) appearing among the new high frequency words in this week’s Mainstream Media word cloud.

Week of June 27 – July 3 (Red) Compared to June 20 – June 26 (Blue) for Russian Mainstream Media:

Some additional prominent topics in the week’s news also become more apparent on examining some of the other week-to-week comparisons for particular media segments.  The ongoing controversy surrounding the corruption accusations against and trial of former Orange Revolution leader Yulia Tymoshenko in Ukraine, for example, attracted the attention of some news segments more than others.  The former prime minister was indicted last December for abuse of power, with President Victor Yanukovich claiming that she illegally used $425 million in “Kyoto money” (money received from the sale of of carbon emission quotas) to finance pensions.  If she is found guilty, Tymoshenko will be banned from holding political office.  While some variant on “Украина” (Ukraine) appears as a high frequency word over the last couple of weeks in the Mainstream Media and the Popular Blogs word clouds, this topic appears not to have received equal attention in all media segments.  A comparison between popular blogs and TV media shows that this story appears to have gotten significantly more attention in the blogosphere than in television news coverage – demonstrated by the appearance of “Украины” in red in the word cloud comparing these two media segments.

Russian Popular Blogs (Red) versus Television (Blue) for Week of June 27 – July 3:

A similar contrast can be seen in the coverage of ongoing conflict between Russia and Belarus over unpaid electricity debt for April and May.  Belarus, which has been suffering a deep economic crisis over the last several months owes Russia some 1.2 billion rubles ($43 million) – a situation which came to a crisis this week, with the Kremlin threatening to cut off Belorussian electricity supplies if this debt was not repaid by Wednesday.  Though the immediate crisis was resolved by week’s end with Belarus promising to pay its debt and Russia restoring power supplies, the tension between the two countries continued, with disagreement as to the extent to which natural gas prices should be reduced in light of the recent Belarusian currency devaluation.  This story, as with that concerning Ukraine, appears to have received more attention in some media segments than others.  In contrast to the Ukrainian trial, this story seems to have been covered more by television and mainstream media and received less scrutiny in the blogosphere.  Note the appearance of “Белоруссия” (Belarus) in blue in the word cloud comparing high frequency words in the week’s TV and Popular Blog media segments.

Posted in Uncategorized | Leave a comment

Russian Media for the Week of 6/20/2011 – 6/26/2011

Russian media this week has been dominated by several new themes, relating to national history, disasters, and high politics.  The red words in the word cloud below indicate words that appeared in this week’s news with unusually high frequency, showing a contrast with the previous week.  (Blue words show high frequency words unique to the previous week, and purple indicates words that appeared with significant prevalence both weeks – generally representative of recurrent themes.)

Week of June 20 – June 26 (Red) Compared to June 13 – June 19 (Blue) for Five Major Russian Media Segments (TV, Pop Blogs, Random Blogs, Mainstream Media, Government):

As is clear from this week’s overall comparative word cloud across five major media segments, one of the dominant themes in the week’s media has been the 70th anniversary of the German invasion of Russia that marked the beginning of the Great Patriotic War (World War II).  The German invasion of the Soviet Union (Operation Barbarossa) began on June 22nd 1941 when Nazi tanks entered Soviet territory near the town of Brest in Belarus.  It was the beginning of four years of war in which over 20 million Soviet soldiers and civilians would perish (over 13% of the population).  The anniversary, referred to as a national “Day of Memory and Sorrow,” was somberly recalled in memorial events across Russia this week.  The unusually high occurrence of various forms of words such as “война” (war), “служба” (service), “великий” (great [patriotic war]), and “военный” (military) indicates the frequency with which the war and its legacy were discussed across the five media segments over the course of this week.  Some variants of one or more of these words appear clearly in the week’s word clouds for both Mainstream Media and Television, indicating that the story had particular prominence across these segments.  In popular blogs, we also see higher than usual discussion involving words such as “советский” (Soviet), often involving discussion of Soviet history and the legacy of the war.

One of the other major stories of the week was the June 20th crash of a passenger airplane (a Tupolev 134A-3) en route from Moscow to Petrozavodsk.  Flight RA-65691 of the airline RusAir (Русэйр) crashed and broke apart on landing, killing forty-seven out of fifty-two occupants.  This story is clearly indicated by prominent words in the week’s word cloud, such as “самолет” (airplane) and “петрозаводск” (Petrozavodsk).  One or both of these words appear in the week’s word clouds for both the Mainstream Media and TV.  The story apparently also received some prominent attention in the Government press, with “мчс” (acronym for the Russian Emergencies Ministry) appearing as one of the week’s highest frequency words for that news segment.  This theme seems to have been particularly picked up in Russian television, with additional words such as “авиакатастрофе” (aviation accident), “больницы” (hospitals), “погибших” (dead/deceased), “аэропорт” (airport), “пассажир” (passenger), “транспорт” (transportation), and “транспортакатастрофы” (transportation accident) featuring as unusually high frequency words visible in the segment-specific weekly word clouds.

A third significant set of stories of this week had to do with the appointments and nominations of officials for government positions.  Specifically, this included President Medvedev’s appointment of officials to fill leadership positions in the Ministry of the Interior (Министерство Внутренних дел Российской Федерации), the President’s apparent support for Saint Petersburg Governor Valentina Matvienko’s nomination as the new Speaker of Russia’s Federation Council (Совет Федерации), and the reappointment of Yuri Chaika as Prosecutor General (Генеральный Прокурор) by the Federation Council.  These stories are indicated by the prevalence of words such as “министерства” (ministry), “внутренних” (internal), “совет” (council), “федерации” ([of the] federation), and “генерал” (general).  The coverage of these news events appears to have been particularly strong, not surprisingly, across the Government media segment, though they also have received some attention in TV, Mainstream Media, and Popular Blogs.

Below are the week’s comparative word clouds from each of the five media segments (TV, mainstream media, government, popular blogs, and a random sample of all blogs).  Click on these figures to view interactive word clouds from which to explore themes of interest.

Week of June 20 – June 26 (Red) Compared to June 13 – June 19 (Blue) for Russian TV:

Week of June 20 – June 26 (Red) Compared to June 13 – June 19 (Blue) for Russian Mainstream Media:

Week of June 20 – June 26 (Red) Compared to June 13 – June 19 (Blue) for Russian Government:

Week of June 20 – June 26 (Red) Compared to June 13 – June 19 (Blue) for Russian Popular Blogs:

Week of June 20 – June 26 (Red) Compared to June 13 – June 19 (Blue) for Russian Random  Blogs:


Posted in Uncategorized | Leave a comment

Russian Media for the Week of 6/12/2011 – 6/18/2011

This week’s Russian word cloud shows some new trends and stories that differ from those of the previous week, though there have been few dramatic shifts in coverage.  The most striking new story to emerge here appears to be that of Colonel Yuri Budanov (Полковник Юрий Буданов), who was murdered while awaiting trial for the rape and murder of a young girl in Chechnya.  This story accounts for several of the increased frequency words that emerge in this week’s word cloud – a pattern also separately visible across all major media segments except for official government sources.  On closer inspection, some other stories have acquired new or renewed attention in particular media segments, with coverage of Ukraine and Mikhail Khodorkovsky featuring prominently in popular blogs and television media respectively.

Words in four prominent media segments (popular blogs, mainstream media, government, television) during the week starting 2011-06-05 (Blue) versus during the week starting 2011-06-12 (Red):

The word cloud above, comparing a combined set of main media sources from June 12th through June 18th 2011 (red) with the same set of sources over the previous week, June 5th through June 11th 2011 (blue), shows several new stories emerging (blue), but none of these are at as high a word frequency as the major words in purple (mentioned frequently both weeks) or even as the major words from the previous week (in red).  The cloud compares the combined sets of popular blogs, mainstream media sources, government media content, and television media content across the two weeks.

Some of the newly prominent words do not appear to represent any major new stories –ubiquitous names and financial terms likely appear as top words only because of a relative decline in other major stories with more uncommon terms.

The overall cosine similarity across the four media segments in Media Cloud between the week of June 05-11 and June 12-18 is 0.905, demonstrating a fairly high level of similarity between the two weeks.  This level of variation is not constant across all media forms, however.  We see some dissimilarities in the patterns of change within distinct media sources.

Government sources here seem to have shown the most significant changes in topical foci between the two weeks, with TV and mainstream media showing the second greatest amounts of change, both showing lower cosine similarity scores than that between popular blogs during this period.  This is interesting, as it indicates that the blogosphere’s topical foci have remained relatively constant while some new topics have been introduced to (or have disappeared from) the mainstream media, TV, and government sources.

In terms of coverage of key stories, it appears that there is substantial difference between the topics receiving greatest attention across the different media segments.  Most of this variation has been consistent over the last week and does not mark a dramatic shift because of the variation in coverage of a suddenly emerging pivotal story.

As we can see here, there has in fact been a modest convergence in the similarity of different news sources in the last week.  That notwithstanding, however, the differences across segments are striking.  The following word cloud shows the comparison between the content of popular blogs versus government media outlets during the June 12th-18th period.

Words in Popular Blogs (Blue) during the week starting 2011-06-12 versus words in Government media sources (Red) during the same week:

Here we see that coverage of war, other countries (including the US and Ukraine), Moscow, words related to the internet, politics, and the Budanov murder (colonel, Budanov, murder) all receive more attention in the popular blogs, whereas words related to economics (budget, financial), governance (regional, municipal, federal, law), citizenship (self-governance, participation, citizen) feature prominently in the government media sources.

The extremely low cosine similarity value between popular blogs and government sources is consistent with tendencies noted in previous blog posts.  Perhaps more surprising is the fact that TV media sources appear even more dissimilar from government sources, with these two media segments showing the lowest cosine similarity for the week at 0.318.

Words in TV (Blue) during the week starting 2011-06-12 versus words in Government media sources (Red) during the same week:

Here the high frequency words from TV (blue) show significant difference from those appearing frequently in government sources (red) with very little overlap (purple) in high frequency words.  While this does not definitively indicate a lack of similarity in coverage (or lack of coverage) of some topics, it certainly appears to indicate that there is a fair degree of dissimilarity in the topics that are covered.  In addition to the TV coverage of the Budanov murder (which did not receive frequent mention in government sources), the TV sources for the week included more prominent discussion of Khodorkovsky, war, other countries (including Europe), and cultural items such as film and festivals.

As these last couple examples indicate, some of this dissimilarity here could have to do with non-news content in the TV news feed (or at least a broader definition of news to include things not addressed by government media sources); but, as demonstrated by the other examples of non-overlapping frequent words, it appears there also is some substantial difference in the primary news content.

Posted in Uncategorized | Leave a comment

Weekly Update: Week of May 23

You know it’s a slow news week when there’s this much baseball–and soccer!–showing up in the US mainstream media:

Continue reading

Posted in Uncategorized | Tagged | Leave a comment

Russian Media Cloud Comparative Analysis

Using Cosine Similarity to compare week to week coverage in Russian media

What are the differences in how various Russian media outlets – traditional and web native – cover events?  How does coverage differ between sources during the same time period? Does coverage overlap – or do different outlets highlight different events?  What do these choices tell us about media outlet priorities and preferences?

With these questions in mind, we used Russian Media Cloud to determine the levels of similarity between four Russian media sets – Russian government websites, Russian television news, Russian mainstream media websites and popular Russian blogs –  over two weeks in April 2011.  We use cosine similarity to determine how similar the content of the media sets are to one another, a method described in this earlier post by Ethan Zuckerman.

Continue reading

Posted in Uncategorized | 2 Comments