Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

Release notes for GATK version 3.6

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited December 2016 in Announcements

GATK 3.6 was released on June 1, 2016. Itemized changes are listed below. For more details, see the user-friendly version highlights.


Variant calling features

  • HaplotypeCaller will now emit a no-call (./.) for any sample where GQ is zero, in both normal and GVCF modes, instead of emitting a specific genotype in which we have zero confidence.

  • GenotypeGVCFs will now emit a QUAL value for hom-ref sites when run in -allSites mode.

  • Implemented tracking of dropped reads by HaplotypeCaller and MuTect2 (see highlights for details).

  • Assorted optimizations to the joint calling code, expected to speed up genotyping (not the overall tool run) by about 10 percent.

  • Enabled MuTect2 to annotate all the same regular (non-AS) annotations as HaplotypeCaller on request.


Assorted new functionality

  • New ranksum annotations (allele-specific insert size and MQ of mate).
  • New -AS mode to run VQSR in an allele-specific manner (both VariantRecalibrator and ApplyRecalibration) (still experimental).
  • VariantRecalibrator can now output the recalibration model to a file (in GATKReport format — use the R library gsalib for reading).
  • Added ability to have VariantRecalibrator retry building the recalibration model if it fails initially. Meant as a workaround for runs on small datasets that fail randomly because the model isn't robust enough. Default behavior remains a single try. Contributed by @depristo / Mark DePristo.
  • ValidateVariants can now perform validation checks specific to GVCFs with the option --gvcf.
  • VariantsToTable now determines each allele's type when -F TYPE and -SMA are specified together.
  • LeftAlignAndTrimVariants now retains genotypes that remain valid after splitting with —splitMultiallelics (previously all were discarded).
  • SelectVariants can now select sites based on the number or fraction of samples that have no-call genotypes (./.) using —maxNOCALLnumber and —maxNOCALLfraction, respectively.
  • DepthOfCoverage now supports collecting coverage statistics for overlapping exons/genes. Contributed by @seru71 / Pawel Sztromwasser.

Assorted bug fixes

  • Handling of allele depths when the NON_REF allele is non-zero (see highlights for details)
  • A sample ploidy check that may have minor performance implications
  • Threshold evaluation in the max alt alleles filter of MuTect2
  • MQ annotation calculation when processing BP resolution GVCFs
  • RankSum calculations on small sample sizes
  • PrintReads’ ability to emit a @PG header record
  • Writing GVCFs to stdout instead of to file
  • Order of column headers in sample_gene_summary reports output by DepthOfCoverage
  • MNP-merging behavior of ReadBackedPhasing: treatment of spanning deletions and consecutive SNPs
  • SelectVariants and VariantFiltration’s ability to update genotype summary annotations (AC, AN and AF)
  • Subsetting alleles from StrandAlleleCountsBySample annotation

Workarounds for weird sites

  • Added an argument to HaplotypeCaller and GenotypeGVCFs, -maxNumPLValues, that controls the maximum number of PL values that can be emitted for a given site. If the number of PLs resulting from the combination of observed alleles and ploidy exceeds this value, no PLs will be emitted. This will cause subsetting errors in SelectVariants but empowers the user to identify and work around difficult sites where this happens.

  • Extended the functionality of the engine-level argument —reference_window_stop to set the reference window size used by VariantAnnotator when annotating hompolymers through the HomopolymerRun annotation. This makes it possible to deal with the problem of homopolymer stretches that are longer than the default window size.


Deleted functionality

  • Removed Phone Home usage tracking system (see highlights for details)
  • Deprecated GenotypeAndValidate tool which was massively outdated and had no unit or integration tests

Tools moved to the open-source core of GATK

  • IndelRealigner and RealignerTargetCreator
  • Post-IR MQ reverter filter to public
  • Moved BQSRGatherer and dependencies to the public module

Core / engine functionality

  • Enabled Java 8 support (see highlights for details)
  • Updated htsjdk & picard to version 2.4.1
  • Tweaks to the genome coordinates parsing system and contig names to support the Hg38 reference
  • Assorted improvements in the handling of errors, warnings and log output. The engine will now output a summary of WARN messages encountered during a run so you don’t have to parse the full log to see if anything worrying-but-not-fatal happened.

Queue

  • Expose time between checks for whether new jobs can be submitted as a user-settable parameter on CLi. Useful when testing pipelines to make idle time shorter. Contributed by @dakl / Daniel Klevebring.

  • Remove mem_free from resident memory request params for Queue because it doesn't work and wouldn't actually reserve memory.


Tool documentation

  • Improvements and clarifications to many tool docs
  • Refreshed organization and naming of tool categories
  • Fixed display of default values for arguments
  • Switched default doc output to html to make the tool docs provided for nightly builds more readable
Post edited by Geraldine_VdAuwera on

Comments

Sign In or Register to comment.