HaplotypeCaller Incorrectly making Heterozygous Calls (Again)
For GATK: GenomeAnalysisTK-2.4-7-g5e89f01
It would appear that the issue with the HaplotypeCaller making incorrect Het calls when it should be Hom has turned up again (if it ever actually went away). Note this appears to be the same issue I reported last time: http://gatkforums.broadinstitute.org/discussion/1805/haplotype-caller-incorrectly-calling-blocks-of-variants-heterozygous but considering the time since that report, and that this is from a different sample set and different versions of GATK I thought it best to create a new post. If your prefer to merge them please do so.
So in this occasion we've been looking at a single animal (~16x) using the HaplotypeCaller & UnifiedGenotyper and once again we are finding that the HaplotypeCaller is making Heterozygous calls where there is no support for them in the BAM.
Example Regions, BosTau6 reference:
Attached you will find images showing this:
For the first position & image chr18:55,432,023-55,432,220 the tracks in IGV are:
HaplotypeCaller VCF, Population UnifiedGenotyper VCF, Sample BAM, Sample ReducedReads BAM.
If we look at this call we have 9x depth, all reads with mapQ 60, Cigar 101M and BaseQ between 21 & 33, a good balance between forward and reverse and ALL 9 reads contain the Alternate allele. Which means the Site should have been called Alt/Alt not Alt/Ref, but for some reason even through there are no reference reads the HaplotypeCaller has called the site Ref/Alt. Note the UnifiedGenotyper correctly calls this site Alt/Alt.
For the second image, position chr18:55,350,724-55,351,079 the tracks in IGV are:
Haplotype Caller VCF, UnifiedGenotyper VCF, UG Population VCF, Sample BAM (PCRdedup, IR, BQSR), Sample ReduceReads Bam
In this example we have 4 Variants in a cluster (chr18:55,350,895-55,350,975) that the HaplotypeCaller has called as Het (Ref/Alt) when there is no support for this call in the Reads, secondly the UnifiedGenotyper has successfully called each site as Homozygous (Alt/Alt). The reads are a bit more complex at this site but in each case there are 11 reads all of which are Alt alleles and no Reference allele.
Note: HaplotypeCaller & UnifiedGenotyper were run on the full bams, not the ReducedRead bams.
I will upload the region of the BAM file and the VCF files as: Chr18-HC-Het-issues-ULG.tar.gz