Interesting conversations

I should have reported on each of these the day they occurred, but well, I didn’t.  Sorry. (Note that the “we” in each case includes multiple members of the dev core.)

Last week we talked with  David Smith and R. Manmatha from U Mass about their work on identifying languages in scanned text. They are able to report on what percentage of a work is in each of the supported languages (six at the moment), which would be very valuable information for the DPLA platform to make available. Indeed, when the metadata records for a work lists the languages, the percentage of “unknown” that their recognition software reports can indicate problems with the OCR.

We also spent a morning with Sebastian Hammer of Index Data talking about federated search. Sebastian favors a hybrid of federated and union search, to gain performance and to access real-time data. For the DPLA platform, the most relevant real-time data is, perhaps, availability of works at local libraries.

This morning we spent an hour skyping with Nasos Drosopoulos and Stefanos Kollias of the MINT project about whether and how to integrate their metadata mapping service into the platform’s ingestion processes. They’re going to send us some more literature, and we’re going to provide them with some sample data.

Last week, I had the wonderful opportunity to have dinner with MacKenzie Smith, who knows an enormous amount about what library tech has been tried and what’s worked. She is going to work with the dev core, which we’re thrilled about.


Looking for collections

Now that we’re more public, we’ll be blogging more (I hope and intend).

We’re working on getting our initial build up, and have run into some of the usual sorts of problems getting it mounted on our new VM. It may take a day or two.

In the meantime, we’re continuing to look for collection metadata we can make accessible through the platform’s API. The ideal collection metadata (for our nefarious purposes) would be completely unencumbered, would attach both at the level of the collection and of items, and would point at an interesting variety of media types. If you’ve got some lying around, let us know. In fact, you can be the first person to try our email address: And if that doesn’t work, for now use my address:  self at

To those who have contributed already or are in the process: Thanks!

Hello world

This is where the dev core team working on the DPLA platform will be blogging. Is blogging.

Now to explicate that first sentence: The DPLA is the Digital Public Library of America. The platform will be a set of services to enable developers to build apps and integrations that make use of the metadata being contributed to the DPLA; that metadata will point at and contextualize distributed content and collections. That’s roughly the idea, anyway. One of the main challenges facing the DPLA (and obviously directly affecting the core dev team)  is figuring out exactly what that means. Finally, the dev core is the small group of people dedicated to developing the platform, in public and collaboratively. The launch date for the DPLA is April, 2013. Ulp.

There’s not a lot to see at this dev core site yet. We are right now in the process of putting together the initial set of communication tools, which will include a mailing list, a wiki, an irc channel, and a twitter account. What are we missing?

The dev core is housed at Harvard, and consists of people from the Berkman Center, the Library Innovation Lab, the DPLA Secretariat, and Pod Consulting. This is an interim team, in place until April 27, 2012, when the DPLA meets in plenary session in San Francisco.

  • Nick Caramello
  • Daniel Collis-Puro
  • Paul Deschner
  • Sebastian Diaz
  • Kim Dulin
  • Rebekah Heacock
  • Maura Marx
  • Laura Miyakawa
  • Matt Phillips
  • David Weinberger

We’ve also been consulting with Martin Kalfatovic and Chris Freeland, co-chairs of the DPLA Tech Workstream, and hope and plan and count on working closely with the entire Tech Workstream.

There will be a home page for this effort at, but for now we’re thinking that will be mainly just a simple directory of pages. We expect the bulk of the substantial communication will be at the wiki, which Daniel C-P is putting together even as I type this.

We’ll have more to say. For now: We’re excited! And we’re counting on you – yes, you –  to help make the DPLA a reality by April 2013.