Fair Use Week 2017: Day Five With Guest Expert Sara R. Benson

Make “Non-Consumptive Use” Part of Your Fair Use Vocabulary

by Sara R. Benson

The HathiTrust Digital Library continues to push the boundaries of open access.

In late 2016, the Library’s Research Center made the entire corpus available for non-consumptive use through its Extracted Features dataset.  Using this dataset, researchers can access the non-expressive content of public domain and copyright-protected works for the purpose of performing data analysis. The dataset opens the corpus to computational research techniques such as topic modeling or machine classification while limiting traditional forms of reading by virtue of its abstracted data structure.


The structured files, presented in JSON format, provide information about the text (the ideas) without revealing its original form (the expression). Although the term “non-consumptive” was never specifically defined in the HathiTrust case,[1] the type of text mining at issue in that case serves as the building block for the transformative use asserted by the HathiTrust and the users of the Extracted Features dataset.

Notably, in Author’s Guild v. HathiTrust, the Second Circuit Court of Appeals stated that the “creation of a full-text searchable database is a quintessentially transformative use.”[2]  The HathiTrust uses the following definition for non-consumptive research:  It is “research in which computational analysis is performed on one or more volumes (textual or image objects) in the HTDL, but not research in which a researcher reads or displays substantial portions of an in-copyright or rights-restricted volume to understand the expressive content presented within that volume.”[3]


In the case of the Extracted Features dataset, instead of reading or consuming the text, researchers are moving from the extracted content to perform statistical analyses, pull out derived data sets, and look at patterns across words to reach new research conclusions.  This is a decidedly different use then for a work of fiction (say, Harry Potter) which is unequivocally for narrative entertainment.

Here instead, researchers are engaged in another important fair use endeavor— to transform the transmission of and interaction with the work from readable text to minable data in order to better understand connections between literature and historical documents and society.

Thus, non-consumptive use, when defined correctly, could never be construed as anything but a fair use. The concept can provide an important framework for other libraries and data providers who wish to open greater access to datasets without infringement.  It also can embolden researchers to incorporate computational techniques into their scholarship, much of which to date has been limited to pre-twentieth century inquires.

And so, with this brief introduction, I issue a call to all fair use advocates:  please make “non-consumptive use” a part of your fair use vocabulary, promote the use of the HathiTrust Extracted Features Dataset, and continue to promote the fair use rights.

  1. It was, however, defined in the amended settlement agreement, ultimately rejected by the court, in Authors Guild v. Google, available at https://www.authorsguild.org/wp-content/uploads/2014/10/2009-Nov-13-AGvGoogle-Amended-Settlement-Agreement.pdf.
  2. 755 F.3d 87, 97 (2d Cir 2014)
  3. HathiTrust Digital Library, HathiTrust Research Center, Non Consumptive Use Research Policy, available at https://www.hathitrust.org/htrc_ncup

Sara R. Benson is Copyright Librarian & Assistant Professor at the University of Illinois Library