Service Notice: Due to the blizzard currently hammering the US Northeast, the Broad is shut down and the GATK forum will be mostly unattended while we hunker down and sip hot cocoa with marshmallows. Assuming the power stays on and we're able to dig ourselves out of the snow when it's all over, normal service should resume Wednesday or Thursday.
Release notes for GATK version 2.1
Base Quality Score Recalibration
Multi-threaded support in the BaseRecalibrator tool has been temporarily suspended for performance reasons; we hope to have this fixed for the next release.
Implemented support for SOLiD no call strategies other than throwing an exception.
Fixed smoothing in the BQSR bins.
Fixed plotting R script to be compatible with newer versions of R and ggplot2 library.
Renamed the per-sample ML allelic fractions and counts so that they don't have the same name as the per-site INFO fields, and clarified the description in the VCF header.
UG now makes use of base insertion and base deletion quality scores if they exist in the reads (output from BaseRecalibrator).
Changed the -maxAlleles argument to -maxAltAlleles to make it more accurate.
In pooled mode, if haplotypes cannot be created from given alleles when genotyping indels (e.g. too close to contig boundary, etc.) then do not try to genotype.
Added improvements to indel calling in pooled mode: we compute per-read likelihoods in reference sample to determine whether a read is informative or not.
Added LowQual filter to the output when appropriate.
Added some support for calling on Reduced Reads. Note that this is still experimental and may not always work well.
Now does a better job of capturing low frequency branches that are inside high frequency haplotypes.
Updated VQSR to work with the MNP and symbolic variants that are coming out of the HaplotypeCaller.
Made fixes to the likelihood based LD calculation for deciding when to combine consecutive events.
Fixed bug where non-standard bases from the reference would cause errors.
Better separation of arguments that are relevant to the Unified Genotyper but not the Haplotype Caller.
Fixed bug where reads were soft-clipped beyond the limits of the contig and the tool was failing with a NoSuchElement exception.
Fixed divide by zero bug when downsampler goes over regions where reads are all filtered out.
Fixed a bug where downsampled reads were not being excluded from the read window, causing them to trail back and get caught by the sliding window exception.
Fixed support in the AlleleCount stratification when using the MLEAC (it is now capped by the AN).
Fixed incorrect allele counting in IndelSummary evaluation.
Now outputs the first non-MISSING QUAL, instead of the maximum.
Now supports multi-threaded running (with the -nt argument).
Fixed behavior of the --regenotype argument to do proper selecting (without losing any of the alternate alleles).
No longer adds the DP INFO annotation if DP wasn't used in the input VCF.
If MLEAC or MLEAF is present in the original VCF and the number of samples decreases, remove those annotations from the output VC (since they are no longer accurate).
Updated and improved the BadCigar read filter.
GATK now generates a proper error when a gzipped FASTA is passed in.
Various improvements throughout the BCF2-related code.
Removed various parallelism bottlenecks in the GATK.
Added support of X and = CIGAR operators to the GATK.
Catch NumberFormatExceptions when parsing the VCF POS field.
Fixed bug in FastaAlternateReferenceMaker when input VCF has overlapping deletions.
Fixed AlignmentUtils bug for handling Ns in the CIGAR string.
We now allow lower-case bases in the REF/ALT alleles of a VCF and upper-case them.
Added support for handling complex events in ValidateVariants.
Picard jar remains at version 1.67.1197.
Tribble jar remains at version 110.
Post edited by rpoplin on
Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT
0 · ·