Paleowebic

March 23, 2008

Paleowebic

I’ve been trying lately to look up stuff online that happened before the Web. It’s like looking for fossils in atmosphere. And the paleowebic tools are pretty sucky. Take for example the San Jose Mercury News archive search. I happen to know there was a story in the business section of the paper in June 1986, about Hodskins Simone & Searls, the advertising agency in which I was a partner for many years. If I look up hodskins, nothing comes up. If I search from 1985 to 2008, three items come up, none relevant. (Well, one might be, but to find out I have to create an “archive account”, specifying a payment method, before proceeding. Kind of a high-friction system.)

It’s not that I want to pay nothing for putting the Mercury to the trouble of providing a service that costs their servers more than nothing. But the complete absence of a widespread and easy to use system for perusing archival material from multiple sources is one that I’d like to help the market solve.

I do have ideas. Stay tuned.

VRM

Posted by:

Doc Searls

2 responses to “Paleowebic”

Andrew Leyden

March 23, 2008 at 8:09 pm

This is a complaint of many, the pre-1994 dearth of materials on the web. It’s such a shame because so much of that information is already out there, just locked behind stupid walls and not properly indexed.

Another growing problem is the reordering of Google to reflect ‘new content first’. For example, I was doing some research on President Clinton’s attacks on the privacy of Americans. I specifically recall when I worked on Capitol Hill saying to myself when one of these bills passed ‘if the US ever goes Orwellian it will be these bills people look back on as laying the groundwork’.

Unfortunately, as we have moved more and more toward the always observed society, I wanted to do some research back to that anecdote from pre-1994 and did some Google searching. Not only was there a lack of information about the bill that passed, but new stories about Hillary Clinton and her views on the Patriot Act / privacy / security etc. took the first 10 pages or so of Google more or less.

You know one place to start in cataloging the old stuff would be the ‘morgues’ of dead newspapers. For example, the ‘morgue’ of the Washington Star newspaper sits in dusty file cabinet after file cabinet in the MLK Public Library in downtown DC.

Reply
Russell Nelson

March 24, 2008 at 1:04 am

Doc, see NNYLN’s Newspaper archive:
http://news.nnyln.net/
They have a microfilm scanner, which they then run through an OCR program which (naturally) doesn’t get everything right, but if you have an article about something, then chances are it will be named several times in that article, and that the OCR will get one of them right. Then, the PDF that gets generated has both the text, and scanned image, and the text location is that of the image.

It’s a really nice system.

Reply

Doc Searls Weblog

Paleowebic

2 responses to “Paleowebic”

Leave a Reply Cancel reply