GATK 4.1.0.0 Release

bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin
edited February 11 in Announcements

I'm delighted to introduce the first major version update to GATK4, version 4.1.0.0! This release includes several exciting new analysis pipelines and tons of improvements to existing tools, many of which are now officially out of beta (YAY!).

You can check out the full release notes on Github to get a sense of the scale of this release, but fair warning, it's a lot. In fact, we felt there was far too much in this release to even give a satisfying overview in a single blog post, so we decided to develop a series of nine blog posts that each cover one of the main functional areas of improvement. The table below lists the nine posts along with a short summary for each. Each blog post was written by the lead developer(s) on that project; it outlines the history of the challenge at hand, the approach that they developed to solve it, and future development prospects.

We plan to publish two posts per week starting tomorrow, so keep an eye out for them, subscribe to forum notifications or follow @gatk_dev on Twitter! We'll add links to the table as the posts become available.

And now without further ado I present to you GATK4.1!!!



Short Variant Caller Roundup Turbocharging Germline Short
Variants Calling
New Features and Improvements
in Mutect2
image image image
Two Sisters!

Mutect2 and HaplotypeCaller both aim to achieve sensitive SNP and indel discovery, though in very different contexts. Despite their different applications, they're more closely related than first meets the eye. GATK 4.1 features several performance and accuracy improvements, spurred by Mutect2 development and simultaneously benefiting both tools. We're also debuting a new beta version of GVCF mode for Mutect2, bringing the HaplotypeCaller's reference confidence model to somatic analysis.
Be big, feel small!

The Broad generates 20 terabytes of data every day, so it is no surprise that we focus much of our efforts in the germline space on processing more data more efficiently. While efficiency improvements in GATK 4.1 satisfy users with the largest cohorts (think All of Us), rest assured we aren't discounting smaller cohorts! See how GATK 4.1 facilitates generating larger, cheaper germline cohort callsets and improves accuracy and usability for single-sample clinical cases.
Expanding the use cases for a proven tool

Enhanced sensitivity and precision allows GATK4.1’s Mutect2 to encompass previously challenging domains, including mitochondria, cfDNA, and multiple tumor samples. We’ve improved performance and accuracy in single-sample calling, and have ambitious plans for more progress.
New! Mutect2 for
Mitochondrial Analysis
New! Mutect2 for Liquid Biopsy Spark Improvements
image image image
Overcoming barriers to understanding the mitochondrial genome

Calling SNPs and INDELs on the Mitochondrial genome poses unique challenges, due to its circular shape and very high copy number. We now have a tested and validated “Best Practices” pipeline using Mutect2 to call short variants at arbitrary allele fractions in the mitochondrial genome.
Adapting a proven tool to liquid
biopsy studies


Coming soon, a pipeline using MuTect2 for low allele fraction variant detection from duplex-sequenced liquid biopsies. Liquid biopsies present novel challenges — requiring high sensitivity at low allele fraction. With a few minor adjustments to parameters passed to MuTect2 and the addition of a new filter, our pipeline achieves > 90% sensitivity at ~1% allele fraction with less than 1FP / MB on three separate panels with territory as large as 2MB.
Delivering results faster

We continue to improve our support for users who want to run on Apache Spark with GATK 4.1. This release includes major improvements to MarkDuplicatesSpark, in particular, as well as the full ReadsPipelineSpark, powered by a brand new Spark I/O library, Disq!
CNV out of Beta! Funcotator out of Beta! CNN out of Beta!
image image image
A production-ready tool to call copy-number variants

In the current stage of evolution, we can still see traits inherited from venerable ancestors in the ModelSegments and GermlineCNVCaller pipelines. However, the GATK 4.1 pipelines also feature new adaptations that dramatically improve performance and enable scalability from exomes to genomes. The GATK 4.1 release brings these pipelines out of beta - adding CNV calling officially to GATK’s growing set of capabilities.
A production-ready tool to predict variant function

We created Funcotator to be a fast and accurate functional annotation tool. The latest release of GATK includes updates to Funcotator that make it even more robust and correct, as well as flexible and prod-ready. The addition of two sets of data sources to go with Funcotator (including Gencode, ClinVar, gnomAD, and more) enable it to be used out-of-the-box to add annotations to either germline or somatic variants.
A production-ready suite of tools for single-sample variant filtration

We present the CNNVariant suite of tools, a compliment to VQSR for single-sample variant filtration. This toolset includes a pre-trained model — ready to score variants — as well as the capability to train new models for new types of data. We gathered a massive amount of data together to train our model, and validated its performance against different biological samples, sequencing machines, and protocols.



Post edited by bhanuGandham on

Comments

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Awesome release roundup. However we are still waiting for the much desired changes to HaplotypeCaller (AKA missed calls due to -L parameter.)

  • hugolamhugolam Member

    Great and thanks. After the update to 4.1, I saw the following error with the "--resource" parameter in VariantRecalibrator:

    A USER ERROR has occurred: Couldn't read file file:///proj/hg19/omni,known=false,training=true,truth=false,prior=12.0:/proj/hg19/omni.vcf. Error was: It doesn't exist.

    The same command works in the previous version, 4.0.12.0. It seems like now its adding the current directory to the parameter --resource and making the whole thing a "file" object? or the API has changed? thanks!

  • cnormancnorman United StatesMember, Broadie, Dev
    edited February 4

    @hugolam The command line syntax for "tagged" arguments such as --resource changed for 4.1. Instead of specifying the tags as part of the argument value, specify them as part of the argument name:

    --resource:known=false,training=true,truth=false,prior=12.0 /proj/hg19/omni.vcf

  • yingchen69yingchen69 nanjingMember

    Hi, where is the doc for gatk4 mitochondria pipeline? The github page (https://github.com/gatk-workflows/gatk4-mitochondria-pipeline) is blank. Best, Ying

  • leshwillleshwill HoustonMember

    how do I make GenomicsDB workspaces by chromosome? Does the example -L 20 in the documentation mean chromosome 20? Thank you for your support.

Sign In or Register to comment.