Release notes for GATK version 3.6
GATK 3.6 was released on June 1, 2016. Itemized changes are listed below. For more details, see the user-friendly version highlights.
Variant calling features
HaplotypeCaller will now emit a no-call (./.) for any sample where GQ is zero, in both normal and GVCF modes, instead of emitting a specific genotype in which we have zero confidence.
GenotypeGVCFs will now emit a QUAL value for hom-ref sites when run in
Implemented tracking of dropped reads by HaplotypeCaller and MuTect2 (see highlights for details).
Assorted optimizations to the joint calling code, expected to speed up genotyping (not the overall tool run) by about 10 percent.
Enabled MuTect2 to annotate all the same regular (non-AS) annotations as HaplotypeCaller on request.
Assorted new functionality
- New ranksum annotations (allele-specific insert size and MQ of mate).
-ASmode to run VQSR in an allele-specific manner (both VariantRecalibrator and ApplyRecalibration) (still experimental).
- VariantRecalibrator can now output the recalibration model to a file (in GATKReport format — use the R library gsalib for reading).
- Added ability to have VariantRecalibrator retry building the recalibration model if it fails initially. Meant as a workaround for runs on small datasets that fail randomly because the model isn't robust enough. Default behavior remains a single try. Contributed by @depristo / Mark DePristo.
- ValidateVariants can now perform validation checks specific to GVCFs with the option
- VariantsToTable now determines each allele's type when
-SMAare specified together.
- LeftAlignAndTrimVariants now retains genotypes that remain valid after splitting with
—splitMultiallelics(previously all were discarded).
- SelectVariants can now select sites based on the number or fraction of samples that have no-call genotypes (./.) using
- DepthOfCoverage now supports collecting coverage statistics for overlapping exons/genes. Contributed by @seru71 / Pawel Sztromwasser.
Assorted bug fixes
- Handling of allele depths when the NON_REF allele is non-zero (see highlights for details)
- A sample ploidy check that may have minor performance implications
- Threshold evaluation in the max alt alleles filter of MuTect2
- MQ annotation calculation when processing BP resolution GVCFs
- RankSum calculations on small sample sizes
- PrintReads’ ability to emit a @PG header record
- Writing GVCFs to stdout instead of to file
- Order of column headers in sample_gene_summary reports output by DepthOfCoverage
- MNP-merging behavior of ReadBackedPhasing: treatment of spanning deletions and consecutive SNPs
- SelectVariants and VariantFiltration’s ability to update genotype summary annotations (AC, AN and AF)
- Subsetting alleles from StrandAlleleCountsBySample annotation
Workarounds for weird sites
Added an argument to HaplotypeCaller and GenotypeGVCFs,
-maxNumPLValues, that controls the maximum number of PL values that can be emitted for a given site. If the number of PLs resulting from the combination of observed alleles and ploidy exceeds this value, no PLs will be emitted. This will cause subsetting errors in SelectVariants but empowers the user to identify and work around difficult sites where this happens.
Extended the functionality of the engine-level argument
—reference_window_stopto set the reference window size used by VariantAnnotator when annotating hompolymers through the HomopolymerRun annotation. This makes it possible to deal with the problem of homopolymer stretches that are longer than the default window size.
- Removed Phone Home usage tracking system (see highlights for details)
- Deprecated GenotypeAndValidate tool which was massively outdated and had no unit or integration tests
Tools moved to the open-source core of GATK
- IndelRealigner and RealignerTargetCreator
- Post-IR MQ reverter filter to public
- Moved BQSRGatherer and dependencies to the public module
Core / engine functionality
- Enabled Java 8 support (see highlights for details)
- Updated htsjdk & picard to version 2.4.1
- Tweaks to the genome coordinates parsing system and contig names to support the Hg38 reference
- Assorted improvements in the handling of errors, warnings and log output. The engine will now output a summary of WARN messages encountered during a run so you don’t have to parse the full log to see if anything worrying-but-not-fatal happened.
Expose time between checks for whether new jobs can be submitted as a user-settable parameter on CLi. Useful when testing pipelines to make idle time shorter. Contributed by @dakl / Daniel Klevebring.
mem_freefrom resident memory request params for Queue because it doesn't work and wouldn't actually reserve memory.
- Improvements and clarifications to many tool docs
- Refreshed organization and naming of tool categories
- Fixed display of default values for arguments
- Switched default doc output to html to make the tool docs provided for nightly builds more readable