Houghton Library, Harvard University
John Tenniel, c. 1864. Study for illustration to Alice’s adventures in wonderland. Harcourt Amory collection of Lewis Carroll, Houghton Library, Harvard University.

We’ve just completed spring semester during which I taught a system design course jointly in Engineering Sciences and Computer Science. The aim of ES96/CS96 is to help the students learn about the process of solving complex, real-world problems — applying engineering and computational design skills — by undertaking an extended, focused effort directed toward an open-ended problem defined by an interested “client”.

The students work independently as a self-directed team. The instructional staff provides coaching, but the students do all of the organization and carrying out of the work, from fact-finding to design to development to presentation of their findings.

This term the problem to be addressed concerned the Harvard Library’s exceptional special collections, vast holdings of rare books, archives, manuscripts, personal documents, and other materials that the library stewards. Harvard’s special collections are unique and invaluable, but are useful only insofar as potential users of the material can find and gain access to them. Despite herculean efforts of an outstanding staff of archivists, the scope of the collections means that large portions are not catalogued, or catalogued in insufficient detail, making materials essentially unavailable for research. And this problem is growing as the cataloging backlog mounts. The students were asked to address core questions about this valuable resource: What accounts for this problem at its core? Can tools from computer science and technology help address the problems? Can they even qualitatively improve the utility of the special collections?

The clients were the leadership of Harvard’s premier Houghton and Schlesinger libraries. The students received briefings from William Stoneman, Florence Fearrington Librarian of Houghton Library, and Marilyn Dunn, Executive Director of the Schlesinger Library and Librarian of the Radcliffe Institute; toured both libraries; and met with a wide range of archivists and librarians, who were incredibly generous with their time and expertise. I’d like to express my deep appreciation and thanks to all of the library staff who helped out with the course. Their participation was vital.

The students’ recommendations centered around the design, development, and prototyping of an “archivist’s workstation” and the unconventional “flipped” collections processing that the workstation enabled. Their process involves exhaustive but lightweight digitization of a collection as a precursor to highly efficient metadata acquisition on top of the digitized images, rather than the conventional approach of  digitizing selectively only after all processing of the collection is performed. The “digitize first” approach means that documents need only be touched once, with all of the sorting, arrangement, and metadata application being performed virtually using optimized user interfaces that they designed for these purposes. The output is a dynamic finding aid with images of all documents, complete with search and faceted browsing of the collection, to supplement the static finding aid of traditional archival processing. The students estimate that processing in this way would be faster than current methods, while delivering a superior result. Their demo video (below) gives a nice overview of the idea.

The deliverables for the course are now available at the course web site, including the final report and a videotape of their final presentation before dozens of Harvard archivists, librarians, and other members of the community.

I hope others find the ideas that the students developed as provocative and exciting as I do. I’m continuing to work with some of them over the summer and perhaps beyond, so comments are greatly appreciated.


4 Responses to “Processing special collections: An archivist’s workstation”

  1. Dorothea Salo Says:

    Stuart, this is a fabulous project! (It’s amazing what happens when students are set a problem and given freedom to solve it. I use this technique a lot in my own classes, and have yet to be disappointed.)

    Your students may be interested in reading about “More Product Less Process,” a lean-mean archival processing method argued for by Greene and Meissner (“More Product, Less Process: Revamping Traditional Archival Processing” American Archivist 68:208-263). MPLP has been extensively discussed in the archival literature; don’t stop at the article I just cited!

    MPLP lessons are now being applied to digitization of archival materials; I know that archivist Josh Ranger at the University of Wisconsin at Oshkosh has done (among other things) enlightening user-opinion surveys on MPLP digitization. I can try to dig up additional work in this area if that would be helpful.

    Your students’ work also comes at a crucial time in the evolution of archival-description software. The two dominant open-source packages in this area, Archon and Archivists’ Toolkit, are merging into a new package called ArchivesSpace http://www.archivesspace.org/ . I encourage your students to look into getting involved in development and testing!

    Disclaimer: I’m not an archivist myself; I only train a few of them in digital tools and techniques.

  2. Stuart Shieber Says:

    @Dorothea: Thanks for the pointers. MPLP is definitely the theme of the students’ proposals. There’s some use of Archivists’ Toolkit here at Harvard, but not at all of our special collections. I hope we’ll be able to look into these newer systems as well.

  3. Daniel Mietchen Says:

    One step I am missing in this workflow is the licensing. Sure, many items in the special collections are old enough not to have this problem, but a good portion probably isn’t. Having a standard protocol for when to license what in what way could really boost the reuse of such resources.

  4. Stuart Shieber Says:

    @Daniel: Yes, the issue of restricting access based on privacy and rights are very important, and received little discussion in the students’ work beyond an understanding that provision needs to be made for archivists’ control over the public visibility of materials for these reasons. By the way, even out of copyright materials in the collection may have distribution issues by virtue of contractual obligations of the library from the donor, so age is not a sufficient criterion for distribution. More thought definitely needs to go into this as we proceed with the project.