I should have reported on each of these the day they occurred, but well, I didn’t. Sorry. (Note that the “we” in each case includes multiple members of the dev core.)
Last week we talked with David Smith and R. Manmatha from U Mass about their work on identifying languages in scanned text. They are able to report on what percentage of a work is in each of the supported languages (six at the moment), which would be very valuable information for the DPLA platform to make available. Indeed, when the metadata records for a work lists the languages, the percentage of “unknown” that their recognition software reports can indicate problems with the OCR.
We also spent a morning with Sebastian Hammer of Index Data talking about federated search. Sebastian favors a hybrid of federated and union search, to gain performance and to access real-time data. For the DPLA platform, the most relevant real-time data is, perhaps, availability of works at local libraries.
This morning we spent an hour skyping with Nasos Drosopoulos and Stefanos Kollias of the MINT project about whether and how to integrate their metadata mapping service into the platform’s ingestion processes. They’re going to send us some more literature, and we’re going to provide them with some sample data.
Last week, I had the wonderful opportunity to have dinner with MacKenzie Smith, who knows an enormous amount about what library tech has been tried and what’s worked. She is going to work with the dev core, which we’re thrilled about.