About dweinberger

I'm a writer and co-director of the Harvard Library Innovation Lab, and am a member of the DPLA Platform dev core. I am aslo a senior researcher at the Harvard Berkman Center for Internet & Society.

First build released

The first and highly tentative build is up and ready for you to poke at.

Our plan is to simultaneously and incrementally build out a reference API and a technical specification, letting them inform each other, while being guided by your participation, experience, and expertise.

First, about the tech spec, or more exactly, the  scope document: We’re working with Nick Caramello and Pod Consulting on building a useful set of docs that lay out a proposed path at multiple levels, from descriptions of the strategies for each area to detailed specs. As a start, a couple of weeks ago we posted a plain English overview on the wiki, which is both exceedingly general and hugely provisional. We will continue to iterate — with you, we very much hope — on far more detailed and technical documents over the next couple of months, primarily on the wiki. The scoping docs will raise issues we will need to address together, and will provide a way for us to make decisions about what functionality to support and what the (loosely-coupled) infrastructure should include. We need your help and participation, because the questions are huge, our team is tiny, and the deadline of April 2013 is just around the corner. Are we including the right services? Architecting it appropriately? In a scaleable yet do-able way? Are there open source projects we should be using? What else?

To get started with today’s build, here are the links you need:

API Base (returns JSON):
http://api.dp.la/dev/item/

Sample query (returns JSON):
http://api.dp.la/dev/item/?search_type=keyword&query=internet

Query Builder (this is a really useful tool):
http://apps.dp.la/dev/query-builder/

API Documentation:
http://dp.la/dev/wiki/Documentation

Source:
https://github.com/dpla

Wiki page for this build, with details about what’s in it:
http://dp.la/dev/wiki/31-01-2012

Note that much of the data in this release is dummy data, although we have included real data from the Biodiversity Heritage Library (thank you!) and from The Bancroft Library at the University of California, Berkeley (thank you!). The wiki build page explains which data has been dummified, and contains the necessary disclaimers, including the important one that this metadata is provided purely for experimentation.

We are putting up a bug reporting mechanism via bugs@dp.la, going into a RedMine instance. The email should be working in the next day or two.

Remember that the main page of the wiki lists ways that you can communicate and participate. We urge you to do so. There is no possibility of our little group doing this right or at all on our own.

Interesting conversations

I should have reported on each of these the day they occurred, but well, I didn’t.  Sorry. (Note that the “we” in each case includes multiple members of the dev core.)

Last week we talked with  David Smith and R. Manmatha from U Mass about their work on identifying languages in scanned text. They are able to report on what percentage of a work is in each of the supported languages (six at the moment), which would be very valuable information for the DPLA platform to make available. Indeed, when the metadata records for a work lists the languages, the percentage of “unknown” that their recognition software reports can indicate problems with the OCR.

We also spent a morning with Sebastian Hammer of Index Data talking about federated search. Sebastian favors a hybrid of federated and union search, to gain performance and to access real-time data. For the DPLA platform, the most relevant real-time data is, perhaps, availability of works at local libraries.

This morning we spent an hour skyping with Nasos Drosopoulos and Stefanos Kollias of the MINT project about whether and how to integrate their metadata mapping service into the platform’s ingestion processes. They’re going to send us some more literature, and we’re going to provide them with some sample data.

Last week, I had the wonderful opportunity to have dinner with MacKenzie Smith, who knows an enormous amount about what library tech has been tried and what’s worked. She is going to work with the dev core, which we’re thrilled about.

 

Looking for collections

Now that we’re more public, we’ll be blogging more (I hope and intend).

We’re working on getting our initial build up, and have run into some of the usual sorts of problems getting it mounted on our new VM. It may take a day or two.

In the meantime, we’re continuing to look for collection metadata we can make accessible through the platform’s API. The ideal collection metadata (for our nefarious purposes) would be completely unencumbered, would attach both at the level of the collection and of items, and would point at an interesting variety of media types. If you’ve got some lying around, let us know. In fact, you can be the first person to try our email address: dev@dp.la. And if that doesn’t work, for now use my address: self@evident.com

To those who have contributed already or are in the process: Thanks!

Hello world

This is where the dev core team working on the DPLA platform will be blogging. Is blogging.

Now to explicate that first sentence: The DPLA is the Digital Public Library of America. The platform will be a set of services to enable developers to build apps and integrations that make use of the metadata being contributed to the DPLA; that metadata will point at and contextualize distributed content and collections. That’s roughly the idea, anyway. One of the main challenges facing the DPLA (and obviously directly affecting the core dev team)  is figuring out exactly what that means. Finally, the dev core is the small group of people dedicated to developing the platform, in public and collaboratively. The launch date for the DPLA is April, 2013. Ulp.

There’s not a lot to see at this dev core site yet. We are right now in the process of putting together the initial set of communication tools, which will include a mailing list, a wiki, an irc channel, and a twitter account. What are we missing?

The dev core is housed at Harvard, and consists of people from the Berkman Center, the Library Innovation Lab, the DPLA Secretariat, and Pod Consulting. This is an interim team, in place until April 27, 2012, when the DPLA meets in plenary session in San Francisco.

  • Nick Caramello
  • Daniel Collis-Puro
  • Paul Deschner
  • Sebastian Diaz
  • Kim Dulin
  • Rebekah Heacock
  • Maura Marx
  • Laura Miyakawa
  • Matt Phillips
  • David Weinberger

We’ve also been consulting with Martin Kalfatovic and Chris Freeland, co-chairs of the DPLA Tech Workstream, and hope and plan and count on working closely with the entire Tech Workstream.

There will be a home page for this effort at http://dp.la/dev/, but for now we’re thinking that will be mainly just a simple directory of pages. We expect the bulk of the substantial communication will be at the wiki, which Daniel C-P is putting together even as I type this.

We’ll have more to say. For now: We’re excited! And we’re counting on you – yes, you –  to help make the DPLA a reality by April 2013.