Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

Springtime of GATK4: pipelines and machine learning at Bio-IT

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited May 15 in Announcements

It's finally Spring in Boston; the trees are sprouting leaves again, everything is turning green and gloriously alive -- and Bio-IT World is starting, which makes it official! Many of you may not know or care about Bio-IT, since it's more a biotech trade show than a scientific meeting, but for us it has become a springtime tradition to announce important developments there. These announcements have often focused on strategic/roadmap level plans -- for example that's where we broke the news last year that GATK4 would be fully open-source to a standing ovation (whoo!) -- but this year we're in a position to talk about the new capabilities we're actually delivering, and that feels really good. To quote the inevitable Steve Jobs, real artists ship, and boy are we shipping.

We have two major themes that we're developing this year: (1) democratization of the Best Practices pipelines, which includes everything from increasing access to ease of deployment, standardization and optimization for cost and speed; and (2) application of machine learning to improve accuracy and scalability in established pipelines as well as tackle new areas like germline CNV discovery.


I'm including a schedule of GATK-related talks and demos below for the tiny minority of you who will actually be there in person; for the rest, we plan to post summaries of the key information over the next few days, right here on the GATK blog, so I promise you won't miss out on the important stuff!

Speaking of which, you might have seen this little item in the news today about Illumina acquiring Edico Genome, with a quote from our very own Anthony Philippakis, who directs our mothership, the Broad's Data Sciences Platform (emphasis mine):

“The scientific community should align around standards to maximize the impact of genomics in health,” said Anthony Philippakis, MD, PhD, Chief Data Officer of the Broad Institute of MIT and Harvard. “We are excited to collaborate with Illumina on approaches and pipelines for the analysis of NGS data. The Genome Analysis Toolkit (GATK) has been adopted by a diverse set of researchers, and we look forward to integrating these methods with Illumina sequencers to improve the overall efficiency of data analysis —enabling the community to more easily share and collaborate.”

You can watch a short video clip of Anthony Philippakis talking about what this means for GATK here; or if you're down at the Seaport today I'm happy to meet up and discuss in person :)


Bio-IT World schedule for GATK & friends

Tuesday 15 May
GATK4 pipelines & machine learning Geraldine VdA 5:00-5:30 Google Cloud booth
Pipelining on FireCloud Geraldine VdA 5:30-6:00 Google Cloud booth
Wednesday 16 May
Pipelining with WDL and Cromwell Jeff Gentry 11:00-11:30 Google Cloud booth
Machine learning in GATK4 Lee Lichtenstein 12:00-12:30 Skyline
Democratizing the GATK4 pipelines Geraldine VdA 12:40-1:10 Cityview 2
Pipelining with WDL and Cromwell Jeff Gentry 3:00-3:30 Google Cloud booth
Pipelining on FireCloud Geraldine VdA 3:30-4:00 Google Cloud booth
GATK4 pipelines & machine learning Geraldine VdA 4:00-4:30 Google Cloud booth
Pipelining on FireCloud Geraldine VdA 5:00-5:30 Google Cloud booth
Thursday 17 May
FAIR data on the cloud Geraldine VdA 11:40-12:10 Waterfront 3

The Google Cloud booth is in the middle lane of the exhibition hall close to the registration desk.

Post edited by Ruchi on
Sign In or Register to comment.