A productive hackathon: Making data more FAIR in Single Cell Portal
By Eric Weitz, software engineer, Data Sciences Platform at the Broad Institute
Have you heard of the FAIR principles? They are a set of guidelines proposed as part of a growing movement to make data more Findable, Accessible, Interoperable, and Reusable. As this movement gains traction, we are seeing more FAIR-related activities at major meetings and conferences. For example the recent Bio-IT World meeting in Boston included a conference track dedicated to FAIR, as well as a hackathon.
I was part of a team of four people from the Broad Institute's Data Sciences Platform that participated in the Bio-IT hackathon. Our goal: make data more FAIR in Single Cell Portal, which is built on top of FireCloud. In addition to improving the Single Cell Portal’s scientific data management, the hackathon also gave our team a chance to work with developers from other organizations in a manner that was uniquely nimble.
The team set out to “FAIRify” metadata describing workflows in Single Cell Portal (SCP). Workflows are an alpha feature in SCP built on the FireCloud API to enable users to run workflows like 10X’s Cell Ranger in the SCP web app, rather than needing a command-line interface. This metadata conforms to the Human Cell Atlas metadata schema and records concrete data on inputs, hardware provisioning, and versioned software dependencies for a given submission. This machine-readable documentation is essential for reuse and reproducibility. Our main goal at the hackathon was to make this metadata more findable via API and accessible to public users.
Little did we know, the hackathon had more plans in store for us! Attendance boomed, and instead of having 2-3 people from outside Broad join our team as initially planned, we had 18. Jonathan Bistline, technical lead of Single Cell Portal, took charge of designing and engineering our main goal. The remaining three Broadies—Alexander Baumann and Kate Voss from FireCloud, and me from SCP—got to work onboarding and brainstorming with colleagues from other institutions on ways to better align data in Single Cell Portal with FAIR principles.
Our unanticipated collaborations turned fruitful. Kate worked with engineers from Ibsen, Addgene and Illumina to prototype FAIR enhancements to our study creation UI—like enabling users to choose an accessible usage license. Alex worked with an engineer from Genentech on a WDL script to chain together Python scripts developed by engineers from Illumina, Ontoforce, NCBI, and other organizations. Jon knocked out our main analysis metadata feature and then some. I helped start and coordinate these mini-teams, got others plugged in, and taught several people how to use Git.
|Jonathan Bistline made analysis metadata more findable and accessible.||Kate Voss and developers from other organizations prototyped a UI to enable users to add more scientific metadata and accessible usage licensing.|
The event was a success. In two days, we implemented a real feature, prototyped two additional features, and explored a few potential new features. We measurably and substantially increased the FAIR Metrics score of Single Cell Portal. The hackathon organization and facilitation by Bio-IT, GO FAIR, NCBI and Broad gave us the participants, FAIR guidance and structure that we needed. Our diverse and not-so-small group of engineers, product managers, and scientists hacked together improvements to software for genomics research in single cell RNA-seq, and gleaned skills from each other in the process.
You can find more details on our team’s experience in the hackathon report, or in our hackathon GitHub repo. Keep an eye on Single Cell Portal as we merge this hackathon work in with our other upcoming features, and our blogs and on Twitter for news about future hackathons!