Springtime of GATK4: pipelines and machine learning at Bio-IT
It's finally Spring in Boston; the trees are sprouting leaves again, everything is turning green and gloriously alive -- and Bio-IT World is starting, which makes it official! Many of you may not know or care about Bio-IT, since it's more a biotech trade show than a scientific meeting, but for us it has become a springtime tradition to announce important developments there. These announcements have often focused on strategic/roadmap level plans -- for example that's where we broke the news last year that GATK4 would be fully open-source to a standing ovation (whoo!) -- but this year we're in a position to talk about the new capabilities we're actually delivering, and that feels really good. To quote the inevitable Steve Jobs, real artists ship, and boy are we shipping.
We have two major themes that we're developing this year: (1) democratization of the Best Practices pipelines, which includes everything from increasing access to ease of deployment, standardization and optimization for cost and speed; and (2) application of machine learning to improve accuracy and scalability in established pipelines as well as tackle new areas like germline CNV discovery.
I'm including a schedule of GATK-related talks and demos below for the tiny minority of you who will actually be there in person; for the rest, we plan to post summaries of the key information over the next few days, right here on the GATK blog, so I promise you won't miss out on the important stuff!
Speaking of which, you might have seen this little item in the news today about Illumina acquiring Edico Genome, with a quote from our very own Anthony Philippakis, who directs our mothership, the Broad's Data Sciences Platform (emphasis mine):
“The scientific community should align around standards to maximize the impact of genomics in health,” said Anthony Philippakis, MD, PhD, Chief Data Officer of the Broad Institute of MIT and Harvard. “We are excited to collaborate with Illumina on approaches and pipelines for the analysis of NGS data. The Genome Analysis Toolkit (GATK) has been adopted by a diverse set of researchers, and we look forward to integrating these methods with Illumina sequencers to improve the overall efficiency of data analysis —enabling the community to more easily share and collaborate.”
You can watch a short video clip of Anthony Philippakis talking about what this means for GATK here; or if you're down at the Seaport today I'm happy to meet up and discuss in person
Bio-IT World schedule for GATK & friends
|Tuesday 15 May|
|GATK4 pipelines & machine learning||Geraldine VdA||5:00-5:30||Google Cloud booth|
|Pipelining on FireCloud||Geraldine VdA||5:30-6:00||Google Cloud booth|
|Wednesday 16 May|
|Pipelining with WDL and Cromwell||Jeff Gentry||11:00-11:30||Google Cloud booth|
|Machine learning in GATK4||Lee Lichtenstein||12:00-12:30||Skyline|
|Democratizing the GATK4 pipelines||Geraldine VdA||12:40-1:10||Cityview 2|
|Pipelining with WDL and Cromwell||Jeff Gentry||3:00-3:30||Google Cloud booth|
|Pipelining on FireCloud||Geraldine VdA||3:30-4:00||Google Cloud booth|
|GATK4 pipelines & machine learning||Geraldine VdA||4:00-4:30||Google Cloud booth|
|Pipelining on FireCloud||Geraldine VdA||5:00-5:30||Google Cloud booth|
|Thursday 17 May|
|FAIR data on the cloud||Geraldine VdA||11:40-12:10||Waterfront 3|
The Google Cloud booth is in the middle lane of the exhibition hall close to the registration desk.