Using SSL to Prove Document Authenticity

This blog post is an idea that I’ve been kicking around for a while but haven’t had the time to research or implement.  I’ve finally decided just to post it speculatively.  I’m really hoping to get feed back from those in the community more knowledgeable about SSL than I am.  Note: This is a relatively geeky topic if you don’t understand what https:// and SSL are this post won’t make much sense…

Introduction

Does anyone know anything about the internals of https?  I was wondering if there is any way to prove that a document downloaded over https really came from the site you claim that it came from.  In other words, if you download a document over https, is there anyway for you to prove to a third party that it actually came from the web site you claim it came from? For example,  let’s say that Alice downloads doc.pdf from https://foobar.com/doc.pdf. https provides Alice assurance that doc.pdf really came from foobar.com (assuming that the certificate is legitimate).  But assuming doc.pdf does not have a digital signature,  if Alice simply sends the downloaded file to Bob, he has no proof that the file actually came from foobar.com. (Obviously, the ideal solution would be for the maintainer of foobar.com to digitially sign the pdf file. But few websites digitially sign the files they distribute and individual users often have no means of convincing a web site to do so.)  My question is whether there is any way for Alice to prove to Bob that she really obtained the file from foobar.com.  I thought that it might be possible for Alice to prove the file’s origin by sending some of the raw network traffic establishing the SSL connection along with the file.  (I’m using a PDF file to simplify the example but presumably the same issues would apply to a web page.)

Use Cases

PACER is an online service used by the United States federal courts to provide online access to court records and documents.  The documents on PACER are generally thought to be in the public domain but remain behind a pay wall.   Efforts such as  the PACER Recycling Project and RECAP allow users to upload PDF documents obtained from PACER to a central server where the documents can then be freely downloaded by others.  However, while PACER uses SSL, it does not provide digitally signed PDF files.  Thus users currently have no way to prove that the documents really came from PACER.

Another use case, is as a replacement for web screen shots.  Because web pages can be easily altered or taken down,  screen shots are often offered as “proof” that a web page used to exist even if it has since been altered or removed.  For example, this CNET news story describes how pranksters from 4chan retaliated against AT&T for blocking their site by posting a fake report saying that AT&T’s CEO died.  The story includes this screen shot of the pranked web page prior to its removal.  Of course screen shots can be easily faked or altered using tools such as Photo Shop or just by saving and editing the html.  Presumably web screen shots posted by CNET are relatively trustworthy, but what about screen shots posted by unknown users?

Ideal Solution

I envision a Firefox extension that would allow a user to easily create an archive bundle for an https: web page containing the page and SSL information proving its legitimacy.  (Obviously this would need to work for single files as well as web pages.)  This bundle would allow other users to view the web page of file as it existed and provide easily verifiable proof that the web page really came from the site in question.

My Questions for the SSL Knowledgable

Is this doable at all?

Screen shots are trivial to fake, if this approach can’t provide perfect proof of the origin of a document how much more assurance would it give you than just a screen shot?

Would releasing the raw https traffic also mean that Alice would be releasing her user name and password?

A minor concern is that the fact that a web site hosted or displayed a particular page is slightly different from the web site signing a file.  Furthermore, there may be issues with XSS vulnerabilities that allow attackers to make an https web site display arbitrary content.  However, XSS attack are a problem now with screen shot being passed around and XSS altered pages could probably be detected by viewing the html source.

But Not All Web Sites Use SSL

It has been repeatedly shown that web 2.0 applications such as gmail and facebook cannot be used securely over an unencrypted connection.  For example, hijacking the account of a facebook users on the same network is trivial. Perhaps I’m being overly optimistic but I believe once these vulnerabilities become more widely know and attack scripts/ exploits become widely available web applications will move to SSL as the default or at least offer https as an option.  (GMail already has an option to enable https though it is buried deeply within the settings.)

Please Comment

There you have it: my first real blog post.  Please let me know what you think.

Update December 13, 2009

Unfortunately, it appears that this won’t work.  The basic problem is that SSL uses a shared key so the client could easily forge messages.  (Initially, technically unsophisticated users might not be able to forge messages and sign them with the key but someone would probably develop an automated tool to do it.)  I still hope that at some point a standardized way to show what a web page showed previously will emerge that’s harder to forge than screen shots. Many thanks to Paco Hope and his colleagues at Cigital for providing feed back on this.