Proteus

Proteus

Names: James Allan, R. Manmatha, David Smith
Affiliation: University of Massachusetts, Amherst
Partners: Perseus Digital Library Project, Tufts University; Internet Archive
URL: http://books.cs.umass.edu/beta-sprint/

The Proteus project proposes the development and deployment of Proteus, an infrastructure that would enable library patrons to discover and find connections within and across books, such as statistics on quotations recurring over time, what portions of books are republished, and searching the books’ contents.

Proteus incorporates state-of-the-art techniques that are trained to be more robust to errors in the recognition of scanned books. These methods are efficient, scalable to millions of books, and can optionally be structured to exploit user annotations where available. Proteus enables people to explore books in ways not possible—or perhaps imaginable—today.

The submission consists of an initial demonstration system that automatically annotates, links, indexes, and searches scanned books for text, pictures, and named entities.

Leave a Reply