Thursday, March 1st, 2018...8:00 am

Born-Digital Blog Post #2: Generating the Report

Jump to Comments

This post continues the series, “Behind the Scenes at Houghton”, giving a glimpse into the inner workings of the library’s mission to support teaching and research. Thanks to Magdaline Lawhorn Administrative Fellow & Project Archivist, for contributing this post.

 Houghton’s born-digital survey journey continues to the next stage, beginning with the procurement and analysis of reports illuminating the extent of our holdings. You might be thinking…..why are they bothering with generating reports from materials already at Houghton? Why not just create new workflows and policies that are mindful of born-digital handling for future ingest? In this case we are looking to the past to inform our present and future practices. But first, we must find and preserve the media hidden in our collections. It is estimated that this media won’t be readable by 2030.

We want to be as thorough as possible, in the hopes that we will eliminate the need for a future born-digital backlog survey. No longer will born-digital materials get cast aside, overlooked, and overshadowed by their analog counterparts. In conjunction with the backlog survey we are updating our accessioning procedures to incorporate born-digital material practices. With these new workflows we will log media at accessioning, by removing, photographing, and creating a unique identifier for each object. By employing these changes we remedy the problem that born-digital materials in our holdings are currently facing, allowing us to forge ahead without adding to the backlog.

Our goal was to pull two reports, one from Aleph (our integrated library system) through a Cognos report and the other via ArchivesSpace (our archival collection management platform) with Harvard LibraryCloud. To generate a list of potential collections harboring born-digital material, we first had to define what we were looking to get out of the report. We decided that the report needed to deliver a few basic elements: identity of the collection; quantity of materials; and type of born-digital materials.

We figured out what we wanted, but how do we go about extrapolating this data from two different information management systems (Aleph and ArchivesSpace)? In order to extract meaningful data we needed to establish a controlled vocabulary. Collaborating with our colleague Vernica Downey, Metadata Librarian, we created a controlled list comprised of 28 terms commonly used to describe various born-digital media. Our controlled vocabulary enabled us to search for both current and legacy media (outdated technology), which included some of the following terms: compact disk, jaz drive, USB, digital media, floppy disk, etc.

After establishing our taxonomy we needed to decide which fields within Aleph and ArchivesSpace needed to be searched. In Aleph, we chose to search across three fields: 500 general note; 520 summary note; and the 545 biography/history note. Around 2011, Houghton made a concerted effort to identify both born-digital and audio-visual materials by implementing the use of the 500 general note field to include the standard phrase “Includes audiovisual and/or digital media: (specify type)”. This forwarding-thinking, lightweight tracking method has proven to be quite instrumental in our search. As for ArchivesSpace, we determined that searching the scope & content note in the resource records would suffice. With the help of Vernica we successfully pulled a report from Aleph, which generated a list of 135 collections. After analyzing the spreadsheet we determined that about half of the collections discovered were false hits. False hits were generated by audio-visual materials, which share a lot of common terminology with born-digital media, so this overlap was expected.

Shows the percentages of audiovisual and born-digital material found. The undetermined category represents materials found that did not designate a certain media type only the phrase “audiovisual and digital media”.


Represents the format type of born-digital materials identified (not the amount of each format of materials) in the Aleph report.

What was unexpected was how difficult it would be to extract the same data from ArchivesSpace. At Houghton Library we don’t have access to the ArchivesSpace API (an application program interface for ArchivesSpace) though it’s on our list of priorities for future development. Instead, we tried to extract the data using LibraryCloud, a Harvard Library-developed metadata aggregator, a hub that makes bibliographic metadata data openly available for re-use through an open API, for every unique resource within Harvard Library’s various systems. Each component within a finding aid has a record in LibraryCloud. There are roughly 20,000,000 aggregated metadata records! LibraryCloud is in active development and its current functionality did not help us narrow our search to Houghton alone. Instead, we are attempting to search using ArchivesSpace’s internal reporting tools.


Stay tuned for the next post where we will discuss more about our born-digital process!

Comments are closed.