Friday, June 1st, 2018...6:00 am

Born-Digital Blog Post #5 : Analyzing Results of the Survey

Jump to Comments

This post continues the series, “Behind the Scenes at Houghton”, giving a glimpse into the inner workings of the library’s mission to support teaching and research. Thanks to Magdaline Lawhorn, Administrative Fellow & Project Archivist, for contributing this post.

So far in this series, I have discussed Houghton’s first steps in the born-digital initiative, the burgeoning digital forensic workstation, allocation of physical space, and our strategy thus far. Let us take a few steps back in order to move forward and revisit the survey report we discussed in Blog Post #2. Through Cognos, a reporting tool, we surveyed our holdings by pulling a report from Aleph (the staff mode of our integrated library system) querying specific fields with a devised vocabulary (i.e. terms such as USB, flash drive, thumb drive, etc.). This tool has allowed us to search for and identify collections that have born-digital components.

Statistics from the Cognos report pulled from Aleph. Undetermined data represents inconclusive information.

135 collections were captured in the report. 58 of the 135 collections were false hits, 13 of the 135 were inconclusive, and the remaining 77 collections were definitely harboring born-digital materials. How did we interpret this data? False hits were partially caused by our controlled vocabulary. During the creation of the list we acknowledged that our born-digital terms would overlap with audio-visual terminology, therefore this outcome was expected. The inconclusive hits were also due to ambiguity. Most of these were derived from phrases such as “digital media” which provided no context to help decide whether they were born-digital or audio-visual. In Blog Post #2 we discussed the implementation of the 500 general note field to include the standard phrase “Includes audiovisual and/or digital media: (specify type)”. This was not standardized until around 2011, thus born-digital material might have been described any which way before. The specific type of carrier (or legacy media) was essential to interpreting this data, allowing us to determine whether it was born-digital or audio-visual.

Legacy Media: Zip disk

The Cognos report is the backbone of our born-digital survey, however it fell short in illuminating certain aspects. Initially we wanted the report to deliver the following: identity of the collection; type of born-digital materials; and quantity of materials. After analyzing the data, we obtained the identity of the collection through the collection title and call number and the type of born-digital materials by identifying the various carrier types. However, the report was unable to successfully achieve the third goal of establishing the quantity of materials. Quantity was only sporadically captured by the Cognos report because it had not been consistently recorded in a uniform manner. Data relating to quantity was sometimes given in the number of born-digital items found, or solely in the potential maximum capacity (the maximum amount of storage any one type of media carrier can store), or not included at all. Instead we have decided to record quantity of items and maximum capacity in our accessioning process.


Statistics on born-digital material based on areas of collection. HOU stands for Modern Books and Manuscripts, THE for Harvard Theatre Collection, and P&GA for Printing and Graphic Arts.

What surprised us most about the data generated for the report was how much we had unintentionally collected, where it was being collected, the diversity of legacy media carriers, and the overall state of the collections (both born-digital and analog). The report highlighted which collecting areas were accumulating born-digital materials. At Houghton, two departments, Harvard Theatre Collection, and Modern Books and Manuscripts, have collected 98% of the born-digital materials. Those materials consist of zip disks, USBs, external hard drives, laptops, 3.5” and 5.25” floppy disks, CDs, and DVDs. We have also encountered collections that were in a minimal or unprocessed state. Happening upon born-digital materials speckled throughout unprocessed materials has given us cause to shift our current born-digital accessioning practice. In the next blog post I will outline the born-digital accessioning workflow.



  • Michael Cuthbert
    June 1st, 2018 at 9:42 am

    Congrats on this great series. A quick note to say that for music historians an interesting subset of the data is the proportion that is both born digital AND audiovisual: generated score files for electronic music; MIDI versions of pieces not yet performed; renderings of set designs not yet (or ever) built. These will eventually be used as sketch studies have been in the past to see the evolution of a work before premiere or publication.

    Myke Cuthbert—music professor and director of DH, MIT

  • Hello Myke. Thank you! This is really good to know. We decided to forge ahead with just the born-digital portion because it is a more manageable number of collections to focus on and build an infrastructure around. That being said it would be great to have the ability to make our hidden audio-visual gems accessible as well for the research value you addressed.