# Release notes for GATK version 2.2

edited November 2012

GATK release 2.2 was released on October 31, 2012. Highlights are listed below. Read the detailed version history overview here: http://www.broadinstitute.org/gatk/guide/version-history

## Base Quality Score Recalibration

• Improved the algorithm around homopolymer runs to use a "delocalized context".
• Massive performance improvements that allow these tools to run efficiently (and correctly) in multi-threaded mode.
• Fixed bug where the tool failed for reads that begin with insertions.
• Fixed bug in the scatter-gather functionality.
• Added new argument to enable emission of the .pdf output file (see --plot_pdf_file).

## Unified Genotyper

• Massive runtime performance improvement for multi-allelic sites; -maxAltAlleles now defaults to 6.
• The genotyper no longer emits the Stand Bias (SB) annotation by default. Use the --computeSLOD argument to enable it.
• Added the ability to automatically down-sample out low grade contamination from the input bam files using the --contamination_fraction_to_filter argument; by default the value is set at 0.05 (5%).
• Fixed annotations (AD, FS, DP) that were miscalculated when run on a Reduce Reads processed bam.
• Fixed bug for the general ploidy model that occasionally caused it to choose the wrong allele when there are multiple possible alleles to choose from.
• Fixed bug where the inbreeding coefficient was computed at monomorphic sites.
• Fixed edge case bug where we could abort prematurely in the special case of multiple polymorphic alleles and samples with drastically different coverage.
• Fixed bug in the general ploidy model where it wasn't counting errors in insertions correctly.
• The FisherStrand annotation is now computed both with and without filtering low-qual bases (we compute both p-values and take the maximum one - i.e. least significant).
• Fixed annotations (particularly AD) for indel calls; previous versions didn't accurately bin reads into the reference or alternate sets correctly.
• Generalized ploidy model now handles reference calls correctly.

## Haplotype Caller

• Massive runtime performance improvement for multi-allelic sites; -maxAltAlleles now defaults to 6.
• Massive runtime performance improvement to the HMM code which underlies the likelihood model of the HaplotypeCaller.
• Added the ability to automatically down-sample out low grade contamination from the input bam files using the --contamination_fraction_to_filter argument; by default the value is set at 0.05 (5%).
• Now requires at least 10 samples to merge variants into complex events.

## Variant Annotator

• Fixed annotations for indel calls; previous versions either didn't compute the annotations at all or did so incorrectly for many of them.

• Fixed several bugs where certain reads were either dropped (fully or partially) or registered as occurring at the wrong genomic location.
• Fixed bugs where in rare cases N bases were chosen as consensus over legitimate A,C,G, or T bases.
• Significant runtime performance optimizations; the average runtime for a single exome file is now just over 2 hours.

## Variant Filtration

• Fixed a bug where DP couldn't be filtered from the FORMAT field, only from the INFO field.

## Variant Eval

• AlleleCount stratification now supports records with ploidy other than 2.

## Combine Variants

• Fixed bug where the AD field was not handled properly. We now strip the AD field out whenever the alleles change in the combined file.
• Now outputs the first non-missing QUAL, not the maximum.

## Select Variants

• Fixed bug where the AD field was not handled properly. We now strip the AD field out whenever the alleles change in the combined file.
• Removed the -number argument because it gave biased results.

## Validate Variants

• Added option to selectively choose particular strict validation options.
• Fixed bug where mixed genotypes (e.g. ./1) would incorrectly fail.
• improved the error message around unused ALT alleles.

## Somatic Indel Detector

• Fixed several bugs, including missing AD/DP header lines and putting annotations in correct order (Ref/Alt).

## Miscellaneous

• Fixed raw HapMap file conversion bug in VariantsToVCF.
• Added GATK-wide command line argument (-maxRuntime) to control the maximum runtime allowed for the GATK.
• Fixed bug in GenotypeAndValidate where it couldn't handle both SNPs and indels.
• Fixed bug where VariantsToTable did not handle lists and nested arrays correctly.
• Fixed bug in BCF2 writer for case where all genotypes are missing.
• Fixed bug in DiagnoseTargets when intervals with zero coverage were present.
• Fixed bug in Phase By Transmission when there are no likelihoods present.
• Fixed bug in fasta .fai generation.
• Picard jar remains at version 1.67.1197.
• Tribble jar remains at version 110.

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

