Release notes for GATK version 2.6

ebanksebanks Posts: 683GATK Developer mod
edited June 2013 in Announcements

GATK 2.6 was released on June 20, 2013. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history

Important note: with this release the GATK has officially moved to using Java 7.

Reduce Reads

  • Small runtime performance improvements contributed by Michael McCowan.
  • Added fix for the "Removed too many insertions, header is now negative" bug.
  • Fixed bug that arises in multi-sample mode and causes the tool to crash.
  • Added --cancer_mode argument to force the user to explicitly enable multi-sample mode.

Unified Genotyper

  • Runtime performance improvements when calling indels; calling indels in a single sample is almost 2x faster in our tests.
  • Fixed bug for bad AD values in some cases.
  • Fixed bug for GENOTYPE_GIVEN_ALLELES mode where it silently fails to genotype indels in some cases.

Haplotype Caller

  • We have been working hard to reduce the number of false negatives (i.e. missed sites) for the Haplotype Caller and as such added a bunch of improvements to this tool. The sensitivity is now better than that of the Unified Genotyper is all of our whole genome tests for both SNPs and indels. Feel free to peruse the detailed version history for more information.
  • The Haplotype Caller now annotates IDs from dbSNP properly.
  • The Haplotype Caller now emits per-sample DP.
  • Fixed bug for bad AD values in some cases.
  • Fixed bug with error: "Only one of refStart or refStop must be < 0, not both" that arose from soft-clipped reads at the beginning of contigs.
  • Implemented a much improved version of GENOTYPE_GIVEN_ALLELES mode in the Haplotype Caller that works so much better.

Indel Realigner

  • Fixed bug where secondary alignments were not being handled correctly.

Genotype Concordance

  • Added an overall genotype concordance metric to the output.
  • Fixed a bug in the printout of molten data in how it treated the genotypes.

Diagnose Targets

  • Diagnose Targets now has an option to output missing intervals.
  • Fixed bug where sometimes intervals were emitted out of order.

Base Recalibrator

  • Fixed bug for reads with indel CIGAR operators (I or D) at the start/end of the read.
  • Introduced a new tool, AnalyzeCovariates, to generate the BQSR quality assessment plots as a separate step, instead of doing it through the BaseRecalibrator.

Combine Variants

  • We no longer add PASS to the FILTER field of unfiltered records.

Variant Annotator

  • The RMSMappingQuality annotation now works properly with reduced reads.
  • The various rank sum tests no longer use reduced reads in their calculations (because those reads do not represent distinct observations).
  • Fixed bug in the BaseQualityRankSumTest annotation where it was not actually using the base qualities.
  • Added a new annotation DepthPerSampleHC that is used by default in the HaplotypeCaller.

Miscellaneous

  • James Warren contributed a patch to have references with non-suffix ".fa" parse correctly.
  • We now emit the GATK version number in the header of VCFs that we produce.
  • Fixed bug in the up front downsampling used by the GATK: reduced reads are no longer allowed to be eliminated during downsampling.
  • dbSNP rsID matching is now smarter: variants are considered matching if they have the same reference allele and at least 1 common alternative allele.
  • We now warn users about using the GATK with RNA-seq data.
  • We now check that -compress arguments are within allowable range 0-9.
  • -rf ReassignMappingQuality can now be used to reassign mapping qualities to 60 before the engine filters them out with MappingQualityUnassigned.
  • Fixed bug where requesting gzip VCF output with multi-threading was causing the GATK to fail.
  • We now require a minimum -dcov value of 200 for Locus and ActiveRegion walkers when downsampling to coverage.
  • Zero-length and repeated cigar elements are collapsed down by default in the engine.
  • -ds option removed from PrintReads because it was redundant with the engine-level -dfrac argument.
  • Fixed bug where the --defaultBaseQualities argument didn't always work.
  • The engine now produces much more accurate read counts for Read traversals.
  • Count Reads now uses a Long instead of an Integer for counts to prevent overflows.
  • Locus Walkers now only try to clip adaptors when both reads of the pair are on opposite strands.
  • Fixed VCF issue where PLs were capped at 32767.
  • Picard/Tribble/Variant jars updated to version 1.91.1453.
Post edited by ebanks on

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Sign In or Register to comment.