Update: Open-Access Society Publishers in Wikidata

As described in my post of 6 September 2019, I am working on adding SOAR Catalog references to Wikidata. Here is an ongoing list as of 11 September 2019. It will be revised in the near future.

 

Anniversaries of Open Access Policies

On 10 September 2009, the U.S. National Center for Atmospheric Research adopted an open access policy.

A dynamic list of more than 650 open access polices and their dates of adoption can be found in Wikidata: https://w.wiki/8DF. The list contains many facts first collected in the UK-based Registry of Open Access Repositories Mandates and Policies (ROARMAP).

Want to contribute? Get started here. 

 

Open-Access Society Publishers in Wikidata

In September 2013 Caroline Sutton, Peter Suber, and Amanda Page launched a third edition of the Society Open Access Research (SOAR) Catalog. As of 2019 it exists online as a continuously updated Google spreadsheet.

Recently I chatted with the catalog’s editors and began an independent, volunteer effort to amplify the carefully maintained contents of this list via the public-domain knowledge base Wikidata. My project aims to include a reference to the SOAR Catalog in each relevant Wikidata society and journal entry. Below is a description of the steps involved and some of the issues that have arisen so far.

PREPARATION

To find out which societies already had representation in Wikidata, I downloaded the SOAR Catalog as a CSV file, then imported it to a new Google sheet. Each row in the Google spreadsheet was assigned a unique identifier.

The SOAR editors had entered society names in abbreviated form, so I changed each name to its fullest form. For example “Soc” was changed to “Society” and “Amer” to “American” and so on.

The revised spreadsheet was downloaded as a CSV file, then uploaded to OpenRefine as a new project.

OPENREFINE

In OpenRefine I reconciled the society names against Wikidata. The retrieved information was saved as a new column “Q” in OpenRefine. Wikidata items were found for 526 societies, roughly half the total on the SOAR list.

Names were reconciled as type “organization” (Q43229, including subclasses “learned society”, “scientific society”, “association”, etc.), and then again as no type. During the reconciliation process, sometimes a society had multiple Wikidata items which I then examined and merged into one.

I exported the revised data as a CSV file, then imported it back to the Google sheet.

ANALYSIS

In the Google spreadsheet, I then analyzed the full catalog of 1,043 journals to find patterns. Were there some societies that published more journals than others? What was the geographic distribution by country? When was the catalog data last updated?

I found roughly 881 unique societies listed in the SOAR Catalog.

The following Google Sheets functions were useful:
=UNIQUE
=VLOOKUP

A few of the journals with multiple publishers had been entered into single cells. The journal Psychology, Community, and Health, for example, is published by the “SPPS – Sociedade Portuguesa de Psicologia da Saúde, SPPC – Sociedade Portuguesa de Psicologia Comunitária, SPSC – Sociedade Portuguesa de Sexologia Clínica, ISPA – Instituto Universitário”. The format of this data is not easily searchable as is, and should be parsed into several cells for each publisher name.

WIKIDATA

For societies not found through OpenRefine reconciliation, I searched each name individually in Wikidata, and if found, added those Q identifiers to my spreadsheet. If not found, I created new Wikidata items. Some of the society websites required a bit of foreign language translating. To create a new item for the Croatian Physical Society, for example, I verified the society website and double-checked spelling in Croatian (“Hrvatsko fizikalno društvo”). Google Translate came in handy, as did the internet browser’s developer page source view.

As of this writing, some eight new Wikidata items have been created for societies (Q67167604, Q67167730, Q67167802, Q67167898, Q67167964, Q67168062, Q67168156, Q67168296). It will take some time to finish the remaining 350 names on the list.

QUICKSTATEMENTS

The tool QuickStatements was used to annotate Wikidata with references. The CSV-formatted syntax looks like this:

qid,P31,Q45400320,S248,Q55823083,s585
Q1376656,P31,Q45400320,S248,Q55823083,+2013-00-00T00:00:00Z/11
Q8035326,P31,Q45400320,S248,Q55823083,+2013-00-00T00:00:00Z/11

RESULTS

A dynamic list of societies in Wikidata with references to the SOAR Catalog can be found here: https://w.wiki/8Cw

NEXT STEPS

The next phase of this effort will address Wikidata representation of the 1,043 journals listed in the SOAR Catalog.

RECOMMENDED RESOURCES

OpenRefine reconciliation guide by Antonin Delpeuch
https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation

OpenRefine Google group
https://groups.google.com/forum/#!forum/openrefine

Google Sheets function list
https://support.google.com/docs/table/25273

QuickStatements help
https://www.wikidata.org/wiki/Help:QuickStatements#Add_statement_with_sources

Societies and Open Access Research project, part of the Harvard Open Access Project
bit.ly/hoap-soar