Going live with Harvard’s catalog

[Note: Dec. 3, 2013: We’ve updated the links on this page and a bit of the text to reflect the current reality about where things are.]

We’re very pleased not only that Harvard University has decided to make virtually its entire catalog of bibliographic records available for bulk download under a Creative Commons 0 (public domain) license, but that we’re providing programmatic access to those records in their entirety the LibraryCloud API. That’s over 12 million full records in the MARC21 format.

It’s live now. Begin with the API documentation (which includes some legal usage notes) here. If you instead want to do a bulk download, please go here.

We are using a two-tier schema. We have a simplified core which combines and extends Dublin Core and Schema.org. It works across data sets as well as we can manage. But we are preserving all the metadata that doesn’t fit into that core. You can access it if you know the schema. In the case of Harvard’s data, it’s MARC21, so the keys are well-known. You can retrieve entire MARC21 records if that’s where your bliss is, or you can grab the fields you want.

The API is an early alpha. Please let us know about problems you encounter.

We’ve also capped access at 3 queries per second from a single IP address. We are feeling our way here, and we think that that’s probably more than any app is going to need for now, unless it’s trying to absorb all the data through the API, in which case we repeat: Go bulk download it. It’s all there, and we’ll all be much happier.

Thank you, Harvard!

And please note and respect the statement of community norms, including the norm that attribution be given to those who are providing this information, including Harvard and also, importantly, the OCLC. Thank you.

8 thoughts on “Going live with Harvard’s catalog

  1. Awesome! Can’t wait to see what comes of this for cataloging and metadata. BTW, there isn’t a link for the batch download.

  2. Pingback: Harvard Releases Metadata Into Public Domain — The Digital Shift

  3. Hi there! I’m doing some work with open bibliographic data at Lincoln University in the UK. the project is in its early stages and we’re looking at combining search results from various endpoints and API’s. I’ve been playing with the Harvard data API for a couple of days and the one thing I’m struggling to do is get search results back for multiple criteria, for example, if I query with:

    http://api.dp.la/v0.03/item/?filter=dpla.creator_keyword:cottrell&filter=dpla.title_keyword:ann&limit=20

    Returns no results. Any ideas, or is this unachievable?

Comments are closed.