You are viewing a read-only archive of the Blogs.Harvard network. Learn more.
Skip to content

China Bans the Letter ‘F’

12-Jun-09

China recently mandated that Green Dam, a client side application that filters pornography and political content, be installed on all computers manufactured in China starting July 1, 2009. One way the application blocks access to sites is to kill the browser window when it tries to visit an offending site. The above video demonstrates that the application is poorly designed such that it can end up killing the browser window every time the user types ‘F’ as the letter in the location window.

downloadable version of the video

What’s happening in the video is GD fails to block falundafa.org the first time it is loaded, so ‘falundafa.org’ gets into the history of the browser. Eventually GD recognizes the offensive content on the sites and kills the whole browser after briefly flashing a ‘you have been filtered’ image. Any time GD flags a site as politically offensive, that url gets entered into a list of urls to trigger a kill-block whenever it is entered into the location bar or window. But that auto-kill-block applies to text brought up in the auto-complete list as well as text in the entry box proper. Since falundafa.org is in both the auto-complete list and in the auto-kill-block list, every time the user brings up the location window and types in ‘f’ to start a url, the location window and current tab are instantly killed.

Presto, China has banned the letter ‘f’!

Full details in our just published ONI report on the tool.

Update: replaced youtube version with local version to support folks blocked from youtube.

Grey Surveillance Talk

20-Jan-09

Ethan Zuckerman wrote up a talk I recently gave on looking at Google’s AdWords as a network of grey surveillance.

Viral Conversations Updates Reviews Policy

20-Jan-09

Viral Conversations has updated its FAQ to no longer suggest that companies let reviewers keep products and to suggest that reviewers disclose within reviews if they are allowed to keep the products they are reviewing. In a previous post, I pointed out that the site’s policy of encouraging companies to gift reviewed products to reviewers and of not encouraging users to disclose those gifts was encouraging fraudulent reviews. The relevant sections of the FAQ now read:

Do I Have to Let the Bloggers Keep The Item?

No, you don’t have to let the bloggers keep the item, in the end it’s up to you. It’s going to depend on a number of factors such as cost and shipping difficulty. Letting the bloggers keep a $25 coffee maker is probably a no brainer, but you may feel a little differently about an $1500 espresso machine. Be as clear as possible in the beginning to avoid any confusion. Additionally if you are letting bloggers keep the item, expect that to be disclosed in the review.

Do I need a Disclaimer on My Post?

We really recommend you do it to be upfront and honest with your readers. It could be something as simple as “The John Smith Camera Company sent me their new ABC-123 DLSR camera to review”. If you do a lot of reviews on your website a more formal review policy should be something you should look into. If you are keeping the item, you should disclose that in your review.

Do I Get to Keep The Product I am Reviewing?

That’s going to vary from offer to offer. Sometimes you will sometimes you won’t. That information should be communicated to you before hand. If you do keep the item you are responsible for any tax liabilities that are incurred. If you do keep the item we recommend that you disclose that fact in your review.

These changes bring the site into line with the practices of mainstream media. There are still inherent biases in letting companies choose which bloggers will review their items and in not publishing negative reviews as strongly suggested by FAQ. But those practices closely parallel the practices of mainstream publications who vie for the advertising dollars of the same companies whose products they are reviewing and who avoid publishing negative reviews.

Update on Chinese Circumvention Tool Snooping

20-Jan-09

Peter Li from the Global Internet Freedom Consortium has responded in the comments to my post about snooping by Chinese circumvention tools:

We apologize for the confusion here. The anti-censorship ranking service is provided by one of the GIFC partners. It only publishes the popularity ranks of destination websites users visit through our anti-censorship tools. It is similar to alexa.com but is only limited to anti-censorship web traffic.

The ranking service is not authorized to access, nor can it access, the data users transmit on the wire. It is not authorized to release logs containing information on the websites any individual user visits either.

The FAQ for the ranking service was not written properly, as originally “user” there meant website owners who may be interested in getting detailed statistics on how their websites are visited through our anti-censorship tools. We apologize that we have overlooked the wording.

The GIFC partner who runs the ranking service, the World Gates’ Inc, has been notified, and that FAQ entry has been removed. Thank you for discovering the problem.

Peter Li
Global Information Freedom Consortium

Also, Rebecca Mackinnon has written an excellent followup to the post that includes a response from Bill Xia of Dynamic Internet Technologies / Dynaweb that ‘DIT never gives out “personal-identifying user data”‘ and the following quote from Peter Li:

Yes, in some cases FBI asked us to provide logs for certain websites or destination IPs in some particular time periods, for example, they would request something like the original IPs who visited xyz.com at Jan 12, 2007, 12:20-30 EST, and the visited web pages. We provided such information as we feel we are obligated to work with law enforcement agencies in the free world.

Note that the above quote does not imply any sort of quid pro quo for FBI access to data. If Dynaweb is storing the data about individual users, they are required by U.S. law to give access to that data in response to government warrants and subpoenas.

Rebecca also gets the issue of the trust invested in circumvention tools precisely right:

The moral of this long story is important: when using circumvention tools, make sure you understand enough about how they work, what they’re meant to be used for, and who runs them, so that you’re not taking a leap of faith with people you would rather not trust.

The decision about who to trust is a personal one: I am more inclined to trust a VPN operating in the U.S. which is subject to FBI requests than a Beijing Telecom connection subject to Beijing public security bureau requests, but that’s just me. Other people might feel very differently and make different choices. Some people may feel very comfortable trusting the Falun Gong… others, well, might not… It appears that the VOA, RFA, and HRIC have decided to trust them and to recommend these services to their users.

Where does this leave the issue?

I’m happy that the data is no longer for sale on the website, but given all of these factors, I’m still concerned with the amount and sensitivity of the data being stored, the lack of disclosure to users about what data is being stored and how it is being used, and the care with which the data is being protected.

I want to make clear first that I am not attacking the motives of the developers of these tools. I have every reason to believe that the people building, distributing, and running these tools are doing so in honest resistance to the restrictive Internet policies of the Chinese government. I should have made that fact clear in my original post. I don’t think anyone was selling data to make a quick buck. I think any money made out of any hypothetical sale of personal data would have been plowed back into the circumvention projects.

Still, I am somewhat skeptical of Peter’s explanation that the issue was merely confusion arising from a misunderstanding of the word “user.” The key sentence seems pretty clear to me: “But data that can be used to identify a specific user are considered confidential and not shared with third parties unless you pass our strict screening test.” To the degree that websites are “identified,” they are already identified in the public aggregate data (google.com, live.com, etc). What additional, confidential data would be published about a website? I think it more likely that the confusion here is between the various projects contributing data and the ranking.edoors.com site displaying the data. In any case, the faq entry in question has now been removed form the site, so if they were offering to sell data, they are not anymore.

But Peter’s further comment about sharing data with the FBI indicates that, whether or not they are actively selling individual user data, they are definitely storing the data on an individual level. This fact alone is cause for concern, or at least for disclosure. There is no law in the U.S. that requires storage of web browsing histories, though the EU data retention directive does require that EU ISPs store the source and destination IP address of every Internet communication. The data flowing over the networks of these circumvention tools is particularly sensitive, since most of the users of the tools are breaking the laws of the countries merely to use them. Any data that is stored can be shared, stolen, subpoenaed, warranted, and otherwise distributed. The fact that some or all of the GIFC circumvention tools are storing browsing histories of individual users vastly increases the level of trust those users are investing in the tools, not just not intentionally to misuse the data but also to safeguard it from attack or from misuse by partners trusted with the data. The current confusion over what data is being shared with the ranking service and what the ranking service is doing with the data is a demonstration of the inherent dangers of storing and sharing the data, even with trusted partners.

Compounding this problem is the fact that none of these tools have anything that even looks like a privacy policy. U.S. style privacy policies have many, many problems, but they do provide some baseline view of what data an organization is collecting and what it is doing with the data. These tools should at the very least be clear to their users what data they are storing, whether or not they are storing data at an individual data, and with whom and under what circumstances they are sharing the data. In general, there’s nothing wrong with collecting personal data as long as you are explicit about what data you are collecting and what you are doing with the data. None of the involved tools (freegate, gpass, firephoenix) makes any attempt to disclose what sort of data they are collecting and storing. And they all make broad claims about protecting the anonymity of their users.

A user should be able to make an informed decision between using a tool that tracks her activity (like dynaweb, gpass, and firephoenix) and a tool that does not (like anonymizer). Note that this is not a personal recommendation on my part to use any tool over any other. Lots of folks have responded to my original post by saying “See, you should use Tor!”. I think Tor is a great project, but without going into depth, it is very open about the ways that it does and does not protect the privacy of its users. As Rebecca says, before using a tool, you should be aware of how it works and what it is doing with you data and then make your decision about what and whom to trust. But projects have to disclose what they are doing with their users’ data for users to be able to make this choice.

Update: Edited to remove sloppy wording that wrongly implied connection between State Department funding and access to data.

Popular Chinese Filtering Circumvention Tools DynaWeb FreeGate, GPass, and FirePhoenix Sell User Data

09-Jan-09

Update: The site hosting the data for these tools has now removed the faq entry offering to sell the data. Please read my subsequent update for responses from the tool developers and further thoughts.

Three of the circumvention tools — DynaWeb FreeGate, GPass, and FirePhoenix — used most widely to get around China’s Great Firewall are tracking and selling the individual web browsing histories of their users. Data about aggregate usage of users of the tools is published freely. You can see, for example, that the three sites most visited by users of these circumvention tools are live.com, google.com, and secretchina.com. Aggregate data like this is a terrific resource for those of us interested in researching circumvention tool usage, and not much of a privacy risk for the circumventing users if it is only stored (as well as displayed) in the aggregate.

But the ranking site also advertises a pay service through which you can get not only much more data, but data about individual users. The site’s FAQ states:

Q: I am interested in more detailed and in-depth visit data. Are they available?

A: Yes, we can generate custom reports that cover different levels of details for your purposes, based on a fee. But data that can be used to identify a specific user are considered confidential and not shared with third parties unless you pass our strict screening test. Please contact us if you have such a need.

So they are happy to provide you with specific user data, but only if you double super promise not to share it and only if they really like you.

It’s hard to state how dangerous this practice is. These tools are acting as virtual ISPs for millions of users. All circumvention tools work by proxying the data of their users through some third machine, so all circumventing traffic is going through that third party machine. Selling the browsing histories of those users is like an ISP selling the browsing histories of its users, which is a big step beyond what companies like NebuAd and Phorm were / are trying to do. NebuAd and Phorm are at least adding a variety of pseudonymity and privacy layers to their tracking, whereas dynaweb et al. are evidently directly storing (and selling) the full, individually identifiable browsing histories of their users.

And the data about circumventing users is much more sensitive than the data about most ISP users. These are the histories of users browsing sites that are not only blocked (and therefore mostly sensitive in one way or another) but blocked by an authoritarian country with an active policy and practice of persecuting dissidents. The mere act of anyone, let alone projects proclaiming themselves for internet freedom, storing this data is very bad practice. Any data that is stored can be potentially be shared or stolen. The best way to make sure that dangerous data like this does not get into the wrong hands is not to store it in the first place.

But these projects are not only storing the data. They are actively offering to sell it. None of the projects has anything like a privacy policy that I can find, and none of them provides any notice anywhere on the site or during the installation process that the project will be tracking and selling user browsing activity.* But all of the sites have deceptive language like this from the FirePhoenix home page:

Secure

FP encrypts all your network traffic. No third-party can recognize what Internet information is flowing in/out of your computer, even if they are monitoring your traffic.

In fact, third parties can recognize the data flowing in/out of a computer running FirePhoenix by buying that data and promising not to share it with anyone else.

This sort of thing demonstrates that there is no way to eliminate points of control from a network. You can only move them around so that you trust different people. In this case, Chinese users are replacing some of the trust in their local Chinese ISPs with trust in the circumvention projects through which they are proxying their traffic. But those tools are acting as virtual ISPs themselves and so have all the potential for control (and abuse) that the local ISPs have. They can snoop on user activity; they can filter and otherwise tamper with connections; they can block P2P traffic.

These particular virtual ISPs have chosen to support themselves by selling user data. Lots of folks rely on personal VPNs to circumvent or otherwise secure their connections, but those VPNs are not inherently any safer that the local ISPs through which they are tunneling. The popular VPN Relakks, for example, is hosted in Sweden, where a law passed last year requires that the federal government monitor all data entering and leaving the country, including foreign users of the Relakks VPN. Some circumvention projects like Psiphon use a peer to peer model in which volunteers host proxies (ideally a volunteer known by the circumventing user) and others like Tor use algorithms to try to ensure trust of the proxies, but all of them require that the user trust some other person or some code with all of her circumventing traffic.

*: installation language not verified for FirePhoenix, which has only a Chinese interface.

Viral Conversations: Community Based Production of Biased Reviews

26-Nov-08

Update: Viral Conversations has changed the language in their FAQ not to encourage gifting of reviewed items and to encourage reviewers to disclose whether they are keeping reviewed items.

I just ran across a new site called Viral Conversations. The basic idea is to serve as a brokerage between companies with products they want reviewed and bloggers who want to review them. Sony submits an offer for a blogger to review a camera, some bloggers submit applications to review the camera, Sony chooses one or more to write reviews, sends them a camera, and the blogger writes a review.

There’s a customer driven version of this basic idea that could be community empowering and in the long term best interests of the company. Maybe the company offers to lend two cameras to qualifying bloggers, and the users of the site vote on which bloggers get to review the cameras. This way, there would be no chance for Sony to pick only friendly reviewers, and reviewers would not get paid for reviews with merchandise. I’m sure there would be problems with this approach, but it’s possible at lest to think hard about how to create such a site in a way that would produce honest, community driven reviews of the products. In fact, such a system could attack existing problems with generating honest reviews through advertiser driven media.

Unfortunately, Viral Conversations is not such an honest attempt. In fact, it not only ignores the problem of biased selection of reviewers, but it is breathtakingly bold in the corruption of its system for generating reviews. Consider the following from the FAQ:

[for advertisers]

What Kind of Reviews Can I Expect?

We encourage all of our bloggers to be as honest as possible. Sometimes there will be negative aspects or criticisms, as this is to be expected. This not only makes the review more believable but gives you suggestions on how you can improve your product.

What if the Review is Negative?

We strongly suggest that all bloggers contact you beforehand if the review is more negative than positive. Hopefully this gives you the opportunity to fix the problem. If a resolution can’t be reached we suggest that the review not be published. We can’t force anyone to not publish or take down a negative review, but we will try to help.

Do I Have to Let the Bloggers Keep The Item?

No, you don’t have to let the bloggers keep the item, but we do think it’s a good idea and really nice thing to do. It’s going to depend on a number of factors such as cost and shipping difficulty. Letting the bloggers keep a $50 coffee maker is probably a no brainer, but you may feel a little differently about an $1500 espresso machine. Be as clear as possible in the beginning to avoid any confusion.
[for bloggers]

Does the Review have to be Good?

No, the review should be honest. Most would agree that the IPhone is a great product, although not everyone likes the touch screen, and it’s safe to say everyone wishes the battery would last longer. These do not make the IPhone a bad product. Talk about the product’s good points, and mention areas where it needs improvement. If you find that your review is more negative than positive or almost all negative, please put on the brakes before you publish. Send an email or pick up the phone and let someone know first.

Do I Get to Keep The Product I am Reviewing?

That’s going to vary from offer to offer. While we recommended that merchants who use our service let you keep the product or item, it’s not always possible. Sometimes it’s a monetary issue, other times it’s a limited availability issue. That information should be communicated to you before hand. If you do keep the item you are responsible for any tax liabilities that are incurred.

So while Viral Conversation can’t absolutely guarantee good reviews or that reviewers get paid by companies for good reviews, they strongly suggest 1) that the companies give the reviewed products to the reviewers and 2) that the bloggers only publish positive reviews. Okay.

And what about a disclaimer from the blogger about the fact that she is basically being paid to write good review?

Do I need a Disclaimer on My Post?

You don’t need a disclaimer but we very strongly recommend you do it to be upfront and honest with your readers. It could be something as simple as “The John Smith Camera Company sent me their new ABC-123 DLSR camera to review”. If you do a lot of reviews on your website a more formal review policy should be something you should look into.

Assuming the advertisers and bloggers follow the suggested practices, the formal review policy should presumably say something to the effect of “You should trust nothing I write in this blog because I’m being paid with in kind merchandise to write only positive reviews”?

Postscript:

I enjoy reading outrageous terms of service. Viral Conversations has a great bit in their terms of service:

Viral Conversations website disclaims any and all responsibility or liability for the accuracy, content, completeness, legality, reliability, or operability or availability of information or material displayed in the ViralConversations.com website pages. [emphasis mine]

Even though I live and work with lawyers, I am not one myself. Still, I’m pretty sure it’s not possible to disclaim all responsibility for the legality of my actions. If it is, I hereby disclaim all liability for the legality of any and all actions committed by me, including swiping your shiny new iphone. …

Google Ad Planner: Advertising Surveillance of the Internet

25-Nov-08

For a long time, the only free source of data about site traffic online has been the Alexa Top Sites list, but the data for the Alexa list is based on the very skewed sample of folks who run the Alexa toolbar, and who the heck runs the Alexa toolbar these days? When I’ve needed data about the most popular sites in a country, I’ve had to use the Alexa data, but only holding my nose with knowledge that the data at best represents a wild guess. There have been better sources of data, but they were all closed, expensive, and generally collected in at least mildly sketchy ways.

Google’s ad planner tool moves dramatically toward filling this big hole in public knowledge about the web site traffic. To try it out, visit the above url and click on the ‘Begin Research’ button.

The ad planner tool is:

a free media planning tool that can help you identify websites your audience is likely to visit so you can make better-informed advertising decisions.

With Google Ad Planner, you can:

* Define audiences by demographics and interests.
* Search for websites relevant to your audience.
* Access aggregated statistics on the number of unique visitors, page views, and other data for millions of websites from over 40 countries.
* Create lists of websites where you’d like to advertise and store them in a media plan.
* Generate aggregated website statistics for your media plan.

What the tool actually does is provide a list of total traffic numbers the 250 most visited sites that meet a number of different demographic queries, including by country and by site type. This lets you, for instance, find out the 250 most visited sites in India, along with the total traffic and number of unique visitors for each site. Or the 250 sites most visited by women. Or by women between 25 and 34. Or by women between 25 and 34 who make more than $150,000 a year:

It’s hard to overstate the power of this tool and the orders of magnitude improvement it is over the Alexa data. You can filter the data by category (newspapers, liberal blogs, flower stores, etc, though the categories seem very poorly assigned). You can choose just sites that allow advertising or all sites (note that the tool shows just advertising sites by default). You can choose sites visited by users who have visited some other site. Or sites visited by users who have searched for some word.

Did you know that the New York Times has twice as many visitors (21 million) as the next closest newspaper, the Washington Post (11 million)? That the Washington Post has half again as much traffic as the next newspaper? That the Huffington Post has basically as much traffic (6.8 million) as every newspaper but New York Times and the Washington Post? That Daily Kos is less than a quarter of the size of the Huffington Post? That unlike in any of the other 20+ included countries, only 2 of the top 25 sites in China are U.S. hosted sites (yahoo at #8 and microsoft at #23)?

And the data used for the tool is the Good Stuff:

How is the data in Google Ad Planner generated?

Google Ad Planner combines information from a variety of sources, such as aggregated Google search data, opt-in anonymous Google Analytics data, opt-in external consumer panel data, and other third-party market research. The data is aggregated over millions of users and powered by computer algorithms; it doesn’t contain personally-identifiable information.

In other words, they use all of the very expensive, somewhat-to-very privacy questionable methods that we privacy interested folks worry about. They tap into their own extensive search logs, the even more extensive data from the adwords system, the extensive data from their analytics tool, and “market research” companies that install spyware that is difficult to distinguish from malware.

But hey, now at least we get the data.

What’s fascinating about this tool is that it’s a market research tool for folks who want to figure out what list of sites to advertise on. It’s little known because it has not been marketed like google trends as a general use tool, even though it is hugely useful as such. In fact, the terms of service only explicitly allows that: “You may use the Program to choose sites on which to target ads” (oddly, the terms also mandates that “The existence of this Program will be deemed Confidential Information” and must be protected with stringent security safeguards, notwithstanding the publication of the tool by google). The fantastic power of this tool for monitoring and understanding the Internet and the wide and deep and invasive methods used to collect the data for the tool point to the very strong connection between surveillance and advertising. The release of this tool and its data ouput as an ‘ad planner’ shows that in the world of adwords, doubleclick’s use of near universal third party cookies, and Phorm’s tapping of UK Internet connections, advertising has become very difficult to distinguish from surveillance.

Where are the AdWords jingles?

12-Nov-08

I’ve been reading up on the history of media and advertising lately, including a book by Stephen Fox on the history of advertising called Mirror Makers. Fox’s core argument is that advertising strategies are cyclical over time, varying between straightforward, plain text advertising that describes the price and value of the product to atmospheric advertising that attempts to attract attention and build up the reputation of a brand. He includes lots of examples of early advertising, including the following jingles about “Sunny Jim” used to sell Force cereal in 1902:

Jim Dumps was a most unfriendly man,
who lived his life on the hermit plan;
In his gloomy way he’d gone through life,
And made the most of woe and strife;
Till Force one day was served to him —
Since then they’ve called him “Sunny Jim.”

Jim Dumps a little girl possessed,
Whom loss of appetite distressed;
“I des tan’t eat!” the child would scream;
Jim fixed a dish of Force and cream —
She tasted it — then joy for him —
She begged for more from “Sunny Jim.”

The Sunny Jim character became a national cultural icon through these jingles. “A giant likeness adorned the sides of two eleven-story buildings in New York,” says Fox, “Songs, musical comedies and vaudeville skits were written about him. Anybody with a cheery personality and the name of James risked being called Sunny Jim.” Unfortunately, the jingles did not help sell much cereal, and they were eventually replaced by straightforward descriptions of the nutritional and economic advantages of the cereal. Some other jingle campaigns in the era worked, others like Force did not.

These jingles were published in written form, obviously because there was no radio in 1900. The idea of advertising as poetry seems quaint today, but actually more possible in the age of the text only AdWords format. It’s striking that AdWords today consists only of straightforward sells. See for instance the following AdWords ads currently running for ‘cereal:’

The strict text limits of the format might be a factor in discouraging jingles, but that same constraint could also serve as a creative force. It would certainly be possible to create an interesting, compelling jingle campaign one line of text at a time, and such an approach would encourage folks to pay more attention to the easily ignored AdWords boxes.

One possibility is that advertisers feel they don’t need jingles to capture attention when their ads are well targeted, as with AdWords. Another is that companies advertise directly through AdWords rather than through an ad agency and so don’t have the access to creative advertising expertise. Another is that AdWords is just young and hasn’t hit the jingle cycle yet.

Whatever the reason, an effect of the lack of AdWords jingles is that the cultural impact of AdWords is mostly limited to the impact the ads have on the creation of content, rather than on the content of the ads themselves. This is a significant divergence from most other modern mass media forms of advertising, in which the ads themselves are arguably as impactful as their impact on the content supported by them.

Surveillance Project Blog

12-Nov-08

We’ve started a blog for the Berkman Surveillance Project. I’ve been posting all of my surveillance stuff there, including the following stories:

Google Privacy Interview

30-Sep-08

Here’s an interview on google privacy issues I did last week for IT Business Edge.