Dataverse Lightning Talk at LibrePlanet 2017

On March 26, 2017 I gave an impromptu five minute “lightning” talk at LibrePlanet 2017 at MIT. I was one of perhaps half a dozen people who jumped up and talked about their open source project. I’m glad that is was recorded because I think it turned out ok! What do you think?

Here’s a transcript of the talk:

Hi, my name’s Phil Durbin. I work down the street at Harvard on an open source project called Dataverse. It’s Apache-licensed and I don’t have anything prepared so I’m going to keep it short and just open it to questions. The problem that it solves is that your tax money is going toward research and hopefully the outputs of that research are being put into open access journals, open access articles. But what about the data that’s associated with that research? Dataverse is a platform for hosting research data. In the academic world, you write a paper, you get a DOI for that paper, a digital object identifier, to uniquely identify your paper. But then, if you have some data associated with your paper, what do you do with it? Do you just throw it on your website? Is that website going to be around in 30 years? That’s the problem that we’re trying to solve, having a permanent place for research data.

If you scroll down on this page [ ], you’ll see that we have about twenty or so installations across the world that run our software. We have a conference coming up in June just down the street at Harvard. We have APIs for getting data in and out of Dataverse. We integrate with a number of other academic research-oriented sort of things. You see journals up here. Open Journal Systems is a way to host a journal online. We integrate with them so that authors of papers can deposit seamlessly from Open Journal Systems into our platform. There’s another piece of software, also open source, called Open Science Framework and if you’re using that to manage your research lifecycle you can publish data into Dataverse from there. A new integration is RSpace. It’s more of a lab notebook and so if you have all of your research in a lab notebook like RSpace, then you can publish your data into Dataverse.

Again, I don’t have too much more prepared. I could go on and on but I see a question in the back. [Inaudible.] That’s a good question. Dataverse is the software, right, but really the question you’re asking is, “Is the institution that hosts the Dataverse software going to be around in 30 years?” The plan is for Harvard to be around in 30 years. In the case of Harvard, where I work, we eat our own dogfood. We run this thing in production. It’s hosted by the Harvard Library. It’s hard for me to… I’m just a developer on the project. It’s going to depend on the institution of course, but the software, under the name Dataverse, is about 10 years old, and the trend has only been more and more adoption within Harvard and across the world so I think we’ll be around. I hope so.

Next question. [Inaudible.] Yeah, sure, we are always interested in partnering. We actually have a feature that we call “harvesting” which is based on a protocol called OAI-PMH. It’s really more for discoverability where if someone else installs Dataverse or any other platform that implements this protocol you can harvest the metadata about the dataset between different installations so you can know that data exists elsewhere. We also have some cool data exploration tools so you can run statistical analysis on tabular data. We also have geospatial mapping of datasets as well.

Another question? [Inaudible.] Again, we tend to replicate the metadata, not the actual data itself, but I imagine if you were truly going under, the ship is sinking, you could probably reach out to one of the other twenty installations of Dataverse and say, “Hey, we’re going to go dark, can someone take our data?” It would be more of an arrangement between institutions I think, but there’s a growing community Dataverse so I would think that someone would pick up the slack, so to speak.

I tweeted about the talk at… if you would like to discuss it there.

The video can also be found at and is a direct link to the YouTube video above.

The transcript can also be found at and if you notice a typo in it, please send me a pull request at .

Posted in Uncategorized