GATK 18.104.22.168 Release
I'm delighted to introduce the first major version update to GATK4, version 22.214.171.124! This release includes several exciting new analysis pipelines and tons of improvements to existing tools, many of which are now officially out of beta (YAY!).
You can check out the full release notes on Github to get a sense of the scale of this release, but fair warning, it's a lot. In fact, we felt there was far too much in this release to even give a satisfying overview in a single blog post, so we decided to develop a series of nine blog posts that each cover one of the main functional areas of improvement. The table below lists the nine posts along with a short summary for each. Each blog post was written by the lead developer(s) on that project; it outlines the history of the challenge at hand, the approach that they developed to solve it, and future development prospects.
We plan to publish two posts per week starting tomorrow, so keep an eye out for them, subscribe to forum notifications or follow @gatk_dev on Twitter! We'll add links to the table as the posts become available.
And now without further ado I present to you GATK4.1!!!
|Short Variant Caller Roundup||Turbocharging Germline Short
|New Features and Improvements
Mutect2 and HaplotypeCaller both aim to achieve sensitive SNP and indel discovery, though in very different contexts. Despite their different applications, they're more closely related than first meets the eye. GATK 4.1 features several performance and accuracy improvements, spurred by Mutect2 development and simultaneously benefiting both tools. We're also debuting a new beta version of GVCF mode for Mutect2, bringing the HaplotypeCaller's reference confidence model to somatic analysis.
|Be big, feel small!
The Broad generates 20 terabytes of data every day, so it is no surprise that we focus much of our efforts in the germline space on processing more data more efficiently. While efficiency improvements in GATK 4.1 satisfy users with the largest cohorts (think All of Us), rest assured we aren't discounting smaller cohorts! See how GATK 4.1 facilitates generating larger, cheaper germline cohort callsets and improves accuracy and usability for single-sample clinical cases.
|Expanding the use cases for a proven tool
Enhanced sensitivity and precision allows GATK4.1’s Mutect2 to encompass previously challenging domains, including mitochondria, cfDNA, and multiple tumor samples. We’ve improved performance and accuracy in single-sample calling, and have ambitious plans for more progress.
|New! Mutect2 for
|New! Mutect2 for Liquid Biopsy||Spark Improvements|
|Overcoming barriers to understanding the mitochondrial genome
Calling SNPs and INDELs on the Mitochondrial genome poses unique challenges, due to its circular shape and very high copy number. We now have a tested and validated “Best Practices” pipeline using Mutect2 to call short variants at arbitrary allele fractions in the mitochondrial genome.
|Adapting a proven tool to liquid
Coming soon, a pipeline using MuTect2 for low allele fraction variant detection from duplex-sequenced liquid biopsies. Liquid biopsies present novel challenges — requiring high sensitivity at low allele fraction. With a few minor adjustments to parameters passed to MuTect2 and the addition of a new filter, our pipeline achieves > 90% sensitivity at ~1% allele fraction with less than 1FP / MB on three separate panels with territory as large as 2MB.
|Delivering results faster
We continue to improve our support for users who want to run on Apache Spark with GATK 4.1. This release includes major improvements to MarkDuplicatesSpark, in particular, as well as the full ReadsPipelineSpark, powered by a brand new Spark I/O library, Disq!
|CNV out of Beta!||Funcotator out of Beta!||CNN out of Beta!|
|A production-ready tool to call copy-number variants
In the current stage of evolution, we can still see traits inherited from venerable ancestors in the ModelSegments and GermlineCNVCaller pipelines. However, the GATK 4.1 pipelines also feature new adaptations that dramatically improve performance and enable scalability from exomes to genomes. The GATK 4.1 release brings these pipelines out of beta - adding CNV calling officially to GATK’s growing set of capabilities.
|A production-ready tool to predict variant function
We created Funcotator to be a fast and accurate functional annotation tool. The latest release of GATK includes updates to Funcotator that make it even more robust and correct, as well as flexible and prod-ready. The addition of two sets of data sources to go with Funcotator (including Gencode, ClinVar, gnomAD, and more) enable it to be used out-of-the-box to add annotations to either germline or somatic variants.
|A production-ready suite of tools for single-sample variant filtration
We present the CNNVariant suite of tools, a compliment to VQSR for single-sample variant filtration. This toolset includes a pre-trained model — ready to score variants — as well as the capability to train new models for new types of data. We gathered a massive amount of data together to train our model, and validated its performance against different biological samples, sequencing machines, and protocols.