Curley’s CALI Keynote and the Future of Open Access Law

Rob Curley‘s keynote speech from Day 2 of the recently concluded 2007 CALI Conference for Law School Computing has not yet been posted to his page at the conference wiki as I write this, which is a shame — keep checking back to see it when it eventually goes up, it’ll be worth your time. Rob is the self-styled “head dork” at the online unit of my old hometown rag. You can see his and his team‘s handiwork at onBeing and the Teen Shopping Project‘s Interactive Mall Map, two recent WaPo interactive features. The title of Rob’s talk was “Hyperlocality,” by which he simply meant that newspapers’ online presences shouldn’t just duplicate the paper’s printed content in electronic form; they should instead exploit the unique capabilities of the Web to do things that would be impossible in print.

Rob demoed a site he had built for a former employer that was an obsessive sports fan’s vision of nirvana. Data about every game, team, and player that the site covered was culled from multiple sources, sliced and diced by custom software, and stuffed into a gigantic relational database that was then parsed to respond to any sort of query a user could conceivably formulate, generating web pages on the fly. Clicking on any team’s name brought up a roster of players and a history of every game the team had played. Clicking on any individual game brought up a screen of photos and press coverage from that event. Clicking on any player’s name brought up a biography page with links to any press coverage that player had received along with their individual statistics and record with every team they’d ever been on. Each of those pages, in turn, was stuffed with links to other teams, other games, other players, and on and on, far exceeding in scope and detail anything you could ever publish in a hard-copy newspaper.

As a tech demo, it was impressive. But I think the demo made a more important point that was less about the technology than about the managerial mindset that brought it into being. Rob’s sports page wasn’t anything that would ever have been created by somebody who just wanted to put the newspaper’s sports section online. It was, instead, an example of what you get when you turn fiendishly creative geeks loose with a crate of Red Bull and some high-end hardware and let them see what they come up with. There are lessons here for how law schools, as culturally conservative institutions, manage information technology, but I’ll leave those for another day.

No, what I really wanted to know after watching Rob’s demo was: how would these guys solve the open access problem?, about which I’ve blogged here seemingly ad nauseam (check our open access tag for a collection of highlights).

You might try this analogy on for size: the federal judiciary is like a collection of teams (courts) composed of players (judges) who come and go over time and assemble in different combinations for individual games (cases) each of which ends up producing some sort of result (an opinion). This process yields lots of raw data that is just waiting to be parsed, remixed, and reassembled by sufficiently clever coders.

Consider the federal appeals courts. All of them post their new opinions on the Web. (Want to see them all? Here you go: 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th, 10th, 11th, D.C., Fed.). But the data isn’t as useful as it could be. It tends to be posted in PDF format, which is searchable, but not easily indexed, and which can require substantial extra effort to recreate the original source text from which the PDF was generated. (To take only two common errors, PDF makes it difficult to extract the text of the courts’ opinions without also extracting extraneous information like running headers and footers, and the PDF format also tends to break the link between footnote text and the accompanying footnote reference in the body.) There’s also generally little accompanying metadata describing the opinions (the 8th Circuit is better than most here; their clerk’s office adds short summary descriptions of the holdings of each case to the information that is posted online) or giving the identities of the judges who participated. (The 7th and 11th also deserve kudos for offering RSS feeds of their new opinions, which ought to be a standard practice for all the appeals courts.)

There’s a project here just waiting for the right people to tackle it. The raw data — judicial opinions — are all there in the public domain, free to copy and reuse. Technologically, there’s no reason why court opinions can’t be given the same treatment that Rob and his team gave their collection of sports statistics — if I click on a judge’s name, I should be able to see a complete list of cases in which that judge has participated, statistics about how often they agree or disagree with other judges on the same court, perhaps even their biographies from their careers before elevation to the bench. Clicking on a court’s name could take you to a page listing all the judges of that court and a list of recent cases decided by the court, perhaps restricted by categories at a very general level (civil versus criminal) or with more granularity (copyright cases versus trademark cases). (I can get some, although hardly all, of this by subscribing to Westlaw or Lexis, of course; but in my ideal world, this information would all be as free as the source data on which it’s based.)

So, who’s game to tackle something like this? LII, I’m looking at you here. Or perhaps Tim Wu? Dan Hunter? Wikilaw?

2 Responses to “Curley’s CALI Keynote and the Future of Open Access Law”

  1. You can get pretty close to all of that on Westlaw, the obvious difference being things that aren’t court document related (such as judge’s bios). Of course that isn’t the point, the point is open vs. closed 🙂

    Someone have $100K of seed capital or grant money just sitting around? I could do this without much difficulty.

    Now, what I would really like to see is someone with some clout and knowhow to spearhead a project standardizing the tech across all of the circuits. It seems to me that the easiest solution to the problem is to have a “CTO” for all of the circuits. With this type of leadership and a small budget it would seem quite easy to make a move towards making as much of the materials as possible available in an open standards/open access way…

  2. Thinking a little more about this, what would be a really nice and unique service would be a service that allows you to apply tags to external text content. For example, if I knew of a story (or a case, it applies equally well), I could provide the link to the material and the service would allow me to apply tags to the source material such as a “fact” tag for things that are facts, etc… then it become easier to spider/index this material to build dynamic interfaces…