Dataverse Lightning Talk at LibrePlanet 2017

On March 26, 2017 I gave an impromptu five minute “lightning” talk at LibrePlanet 2017 at MIT. I was one of perhaps half a dozen people who jumped up and talked about their open source project. I’m glad that is was recorded because I think it turned out ok! What do you think?

Here’s a transcript of the talk:

Hi, my name’s Phil Durbin. I work down the street at Harvard on an open source project called Dataverse. It’s Apache-licensed and I don’t have anything prepared so I’m going to keep it short and just open it to questions. The problem that it solves is that your tax money is going toward research and hopefully the outputs of that research are being put into open access journals, open access articles. But what about the data that’s associated with that research? Dataverse is a platform for hosting research data. In the academic world, you write a paper, you get a DOI for that paper, a digital object identifier, to uniquely identify your paper. But then, if you have some data associated with your paper, what do you do with it? Do you just throw it on your website? Is that website going to be around in 30 years? That’s the problem that we’re trying to solve, having a permanent place for research data.

If you scroll down on this page [ dataverse.org ], you’ll see that we have about twenty or so installations across the world that run our software. We have a conference coming up in June just down the street at Harvard. We have APIs for getting data in and out of Dataverse. We integrate with a number of other academic research-oriented sort of things. You see journals up here. Open Journal Systems is a way to host a journal online. We integrate with them so that authors of papers can deposit seamlessly from Open Journal Systems into our platform. There’s another piece of software, also open source, called Open Science Framework and if you’re using that to manage your research lifecycle you can publish data into Dataverse from there. A new integration is RSpace. It’s more of a lab notebook and so if you have all of your research in a lab notebook like RSpace, then you can publish your data into Dataverse.

Again, I don’t have too much more prepared. I could go on and on but I see a question in the back. [Inaudible.] That’s a good question. Dataverse is the software, right, but really the question you’re asking is, “Is the institution that hosts the Dataverse software going to be around in 30 years?” The plan is for Harvard to be around in 30 years. In the case of Harvard, where I work, we eat our own dogfood. We run this thing in production. It’s hosted by the Harvard Library. It’s hard for me to… I’m just a developer on the project. It’s going to depend on the institution of course, but the software, under the name Dataverse, is about 10 years old, and the trend has only been more and more adoption within Harvard and across the world so I think we’ll be around. I hope so.

Next question. [Inaudible.] Yeah, sure, we are always interested in partnering. We actually have a feature that we call “harvesting” which is based on a protocol called OAI-PMH. It’s really more for discoverability where if someone else installs Dataverse or any other platform that implements this protocol you can harvest the metadata about the dataset between different installations so you can know that data exists elsewhere. We also have some cool data exploration tools so you can run statistical analysis on tabular data. We also have geospatial mapping of datasets as well.

Another question? [Inaudible.] Again, we tend to replicate the metadata, not the actual data itself, but I imagine if you were truly going under, the ship is sinking, you could probably reach out to one of the other twenty installations of Dataverse and say, “Hey, we’re going to go dark, can someone take our data?” It would be more of an arrangement between institutions I think, but there’s a growing community Dataverse so I would think that someone would pick up the slack, so to speak.

I tweeted about the talk at https://twitter.com/philipdurbin/status/… if you would like to discuss it there.

The video can also be found at https://media.libreplanet.org/u/libreplanet/m/lightning-talk-philip-durbin/ and https://www.youtube.com/watch?v=-GUr-cd_OWQ is a direct link to the YouTube video above.

The transcript can also be found at http://wiki.greptilian.com/talks/2017/libreplanet-dataverse-lightning-talk and if you notice a typo in it, please send me a pull request at https://github.com/pdurbin/wiki .

Posted in Uncategorized

DVN 3: Dataverse back in 2013

Please note: This content was originally hosted at http://people.iq.harvard.edu/~pdurbin but that site has gone dark and I wanted to preserve what I wrote in the timeframe between December 2012 and May 2013. I had just begun working as a developer for Dataverse and this was my write up of what Dataverse was back then. I was getting oriented with the features offered, the code, the community, and the ecosystem. Throughout you’ll see references to “DVN” because that’s what we called the Dataverse software back then. It stood for “Dataverse Network” and we called it “DVN 3.” The software has since been rewritten and rebranded as just “Dataverse”.

It’s surprising how many of the links in the post no longer work. We managed to get the domain “dataverse.org” (replacing “thedata.org”) and we did a rewrite, which included some rebranding. Here are updated links:

Ok, on to the old post, last updated in 2013:


Philip Durbin, Software Developer

Philip Durbin
  • open source
  • data
  • collaboration
  • community

I work on The Dataverse Network Project ( http://thedata.org ), an open source web application for sharing, citing, analyzing, and preserving research data.

dvn-logo

If you have research data… you can host it for FREE at http://thedata.harvard.edu 🙂

dataverses

A “dataverse” is simply a container, a place to upload your data: http://en.wikipedia.org/wiki/dataverse

 

viz example

If you have time series data (on recession trends for example) you have your DVN visualize it (as above) by following http://guides.thedata.org/book/data-visualization

DVN can provide descriptive statistics of your data. Here’s the age variable from a census of Utah in 1880:

 

R example

 

On tabular and network data, you can perform statistical analysis by following http://guides.thedata.org/book/subset-and-analysis

http://dvn-demo.iq.harvard.edu is a great place to test out the DVN software. Go ahead and upload some data and play around. 🙂

open-source-initiative-logooctocat

The Dataverse Network (DVN) software is open source. The code is hosted at https://github.com/iqss/dvn and bugs are tracked at http://redmine.hmdc.harvard.edu/projects/dvn

Your institution is welcome to download and set up their own Dataverse Network installation on their own server. If you need help installing your DVN, please email us at support@thedata.org

If you don’t have a server handy, you can try installing a DVN on a virtual machine on your laptop with https://github.com/pdurbin/dvn-vagrant or https://github.com/dvn/dvn-install-demo . Please don’t use this in production. 🙂
java-logopostgresql-logo

If you’d like to contribute code, please see http://devguide.thedata.org

If you’d like to work on bugs that have been assigned to me, please be my guest. 🙂

I tend to work on the business logic:

JSF diagram

(Image from http://blog.xebia.fr/2009/06/03/seam-repenser-larchitecture-des-applications-web/ )

twitter-logo gplus-64

If you’d like to get involved with the DVN community, you can check out our tweets at http://twitter.com/thedataorg or join the mailing list at http://groups.google.com/group/dataverse-community

I started a Google+ page for DVN and I sometimes chat with people in #dvn on Freenode: http://irclog.iq.harvard.edu/dvn

iqss-logo

The Dataverse Network is one of many products developed by The Institute for Quantitative Social Science (IQSS) at Harvard University: http://www.iq.harvard.edu/products

The source for many IQSS projects can be found under https://github.com/iqss but see http://iqss.github.com/github-at-iqss for a more complete list.

HUIT LTS

Both DVN installations at Harvard (the IQSS Dataverse Network and the Harvard-Smithsonian Astronomy Dataverse Network) are ably hosted by Harvard University Information Technology (HUIT) Library Technology Services (LTS): http://library.harvard.edu/project-update-dataverse

From time to time I check http://bugz.hul.harvard.edu/buglist.cgi?product=Dataverse for anything LTS might need from me.

Earth from http://www.flickr.com/photos/donkeyhotey/5679642883/

http://thedata.org lists Dataverse Networks around the world.

git-tree-munch

This web page is written in Markdown and rendered into HTML with Jekyll. The source can be found at https://git.huit.harvard.edu/pdurbin/pdurbiniq

greptilian-logo

My personal website is http://greptilian.com

Posted in Uncategorized

Hello world!

Thanks for visiting! Please see the about page for more about me as well as my personal website at greptilian.com. As of this writing I work at IQSS on Dataverse. I like it there. 🙂

This being the first post and all, I guess I can get a little meta. I created a blog here at blogs.harvard.edu because of the retirement of the hosting I was using previously. Hosting at scholar.harvard.edu was suggested to me but I was turned off by the description on the homepage that read, “OpenScholar@Harvard is a free web site building tool available to faculty, graduate students and visiting scholars at Harvard.” I’m a staff member and I while I appreciate research, scholarship, and science generally, I don’t consider myself a scholar. I don’t really consider myself much of a blogger either but when I googled for “Harvard blogs” I was happy to discover that I had missed the news that blogs.law.harvard.edu had dropped “law” from the URL, becoming this site. I think it’s fantastic that Harvard is offering a blogging platform for anyone with a harvard.edu address. Thanks!

Technically, I still have an old blog at people.fas.harvard.edu/~pdurbin/blog but I haven’t updated it since I switched to my current job in late 2012. Also, there’s a blog for my current project at dataverse.org/blog but I’m not the one who writes those posts. I’m sure I’d be welcome to be a guest blogger there but I like having my own space to have my own voice. For example, I’m considering giving a talk at the Harvard IT Summit some day but I thought that perhaps I would start with a blog post to gauge interest in various topics and reach a wider audience.

Oh, I don’t plan to enable comments on this blog. If you spot a typo or otherwise want to reach me, my contact information is on my about page.

Phew. I think that’s enough meta for now. 🙂

Posted in Uncategorized