The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the github release page for download and details.

HaplotypeCaller Incorrectly making Heterozygous Calls (Again)

aeonsimaeonsim Member
edited March 2013 in Ask the GATK team


For GATK: GenomeAnalysisTK-2.4-7-g5e89f01

It would appear that the issue with the HaplotypeCaller making incorrect Het calls when it should be Hom has turned up again (if it ever actually went away). Note this appears to be the same issue I reported last time: but considering the time since that report, and that this is from a different sample set and different versions of GATK I thought it best to create a new post. If your prefer to merge them please do so.

So in this occasion we've been looking at a single animal (~16x) using the HaplotypeCaller & UnifiedGenotyper and once again we are finding that the HaplotypeCaller is making Heterozygous calls where there is no support for them in the BAM.

Example Regions, BosTau6 reference:

Attached you will find images showing this:
For the first position & image chr18:55,432,023-55,432,220 the tracks in IGV are:
HaplotypeCaller VCF, Population UnifiedGenotyper VCF, Sample BAM, Sample ReducedReads BAM.
If we look at this call we have 9x depth, all reads with mapQ 60, Cigar 101M and BaseQ between 21 & 33, a good balance between forward and reverse and ALL 9 reads contain the Alternate allele. Which means the Site should have been called Alt/Alt not Alt/Ref, but for some reason even through there are no reference reads the HaplotypeCaller has called the site Ref/Alt. Note the UnifiedGenotyper correctly calls this site Alt/Alt.

For the second image, position chr18:55,350,724-55,351,079 the tracks in IGV are:
Haplotype Caller VCF, UnifiedGenotyper VCF, UG Population VCF, Sample BAM (PCRdedup, IR, BQSR), Sample ReduceReads Bam
In this example we have 4 Variants in a cluster (chr18:55,350,895-55,350,975) that the HaplotypeCaller has called as Het (Ref/Alt) when there is no support for this call in the Reads, secondly the UnifiedGenotyper has successfully called each site as Homozygous (Alt/Alt). The reads are a bit more complex at this site but in each case there are 11 reads all of which are Alt alleles and no Reference allele.

Note: HaplotypeCaller & UnifiedGenotyper were run on the full bams, not the ReducedRead bams.

I will upload the region of the BAM file and the VCF files as: Chr18-HC-Het-issues-ULG.tar.gz

Post edited by aeonsim on
1648 x 1021 - 61K
1652 x 1021 - 75K


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    OK, we'll take a closer look at this. Note that we were never able to replicate any of your issues in the past, which is probably why they were never fixed.

    Thanks for uploading the data files; we'll also need the exact command line that your are using with the latest stable version, and the BosTau6 reference files too. Looks like they were deleted off of our FTP from the last time you uploaded them.

  • Ok, is your FTP currently working I'm getting errors about service not available when attempting to upload the files.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    It should be working, yes. Are you sure you're using the correct credentials? They are different for uploading vs. downloading.

    Otherwise there is currently an issue that prevents direct connection through shell protocols, so if you're trying that you may need to switch to a GUI client.

  • Well I've tested it on the extracted BAM I'm trying to upload with the command:

     java -jar ../../tools/GenomeAnalysisTK-2.4-7-g5e89f01/GenomeAnalysisTK.jar -R ../../refs/bosTau6.fasta -T HaplotypeCaller -I Chr18-53m-55m.bam -L chr18:53970861-55552981 -o test.mara.vcf

    And gotten exactly the same result as in the Screenshots and previous VCFs so hopefully you'll be able to replicate it this time.

  • Ok that's an annoying bug I was using cli ftp, filezilla works. Upload of the BAM and VCFs is completed, reference will be up in the next 30mins or so.

  • Reference: bosTau6.fasta.bz2 uploaded, hopefully your able to replicate the issue this time.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Good news: we've been able to track down your issues.

    The first region is due to a bug that is fixed in our internal development version. It will be available in the next release, or you can use our nightly builds (see downloads page) if you can't wait. The bug was that we were accidentally filling in the bases for the nearby deletion which caused the wrong haplotype to be used for genotyping.

    The second region is an assembly/pruning problem that should be fixed by upcoming changes to the HC's assembly method. This should also be available in the next release. It is not currently available in the nightly builds however.

    Thanks for uploading your files and helping us fix these bugs!

  • Excellent, I will look forward to the update/release.

  • Hi,

    sorry for hijacking this thread but I thought this might be a related issue. I was using GATK for variant calling on maize and stumbled across the following call:

    chr_1 8823 ZmSYNBREED_10000_294 AC A 40626.35 . AC=58;BaseQRankSum=3.583;MQRankSum=1.436;InbreedingCoeff=0.4577;AF=0.935;HRun=1;AN=62;MQ0=0;FS=0.491;MQ=55.30;QD=21.19;HaplotypeScore=312.4568;SB=-19795.84;DS;DP=1931;ReadPosRankSum=0.686 GT:AD:DP:GQ:PL 0/0:14,0:14:42.14:0,42,635 1/1:26,28:55:95.13:1213,95,0 1/1:2,19:21:60.19:811,60,0 1/1:3,21:24:72.23:975,72,0 1/1:21,19:42:99:1004,101,0 1/1:84,126:214:99:5980,532,0 1/1:0,22:22:66.22:911,66,0 1/1:77,19:96:99:1386,193,0 1/1:53,46:102:99:2426,229,0 1/1:46,31:78:99:1676,178,0 1/1:39,11:52:94.14:747,94,0 1/1:10,4:15:12.72:208,13,0 1/1:68,7:76:99:617,134,0 1/1:4,22:26:78.19:994,78,0 0/1:11,15:26:29.99:606,0,30 1/1:18,19:37:40.93:868,41,0 1/1:52,17:70:99:1145,121,0 1/1:63,16:79:99:1177,158,0 1/1:89,112:202:99:5408,495,0 1/1:32,14:46:22.64:800,23,0 1/1:29,32:61:99:1723,159,0 1/1:11,25:36:93.85:1127,94,0 1/1:14,13:27:66:639,66,0 1/1:117,49:167:99:2834,368,0 1/1:19,20:39:58.17:853,58,0 1/1:45,8:53:97.28:576,97,0 1/1:60,18:77:99:1115,147,0 1/1:54,12:66:99:951,136,0 1/1:1,19:20:60.20:827,60,0 1/1:38,16:54:99:969,118,0 0/1:32,0:33:60:60,0,644

    What appears odd to me are the genotypes given the actual read depths, e.g. the very last sample: 32x reference, 0x alternative => heterozygous or the very first sample: 26x ref, 28x alt => homozygous alt. When looking into the BAM file for this example, I foudn that all reads are mapped with good mapQ and baseQ. Is there any obvious explanation for this behaviour that I am missing?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @mseidel,

    That sounds like the same bug aeonsim reported. Were you also using HaplotypeCaller?

  • Ok, I feared so. I am actually using the UnifiedGenotyper with reference, my bamfiles and --dbsnp, --genotype_likelihoods_model BOTH -L -o .

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hmm, might not be the same issue then since aeonsim's bug is specific to HaplotypeCaller. Are you using the latest version of GATK?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Note that your problem might simply be due to the AD annotation not being properly computed for indels. We have seen that happen before. You should check the pileup of reads (eg in a genome browser like IGV) and see for yourself what is the allele depth at that site.

  • I was using GenomeAnalysisTK-1.6-5-g557da77 which at the time of the analysis was the latest version but I just found this oddity today. Any chance this is a known bug which has been fixed in more recent versions?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Oh, that's a very old version. It is extremely likely that if that was a bug, it has been fixed since then. You should definitely upgrade to a more recent version. We are currently on 2.4-9.

  • Ok, I'll rerun with the latest version and report back if the problem persists. Sorry for bothering with an issue from an out-of-date version :S

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hah, no worries :) Good luck!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @aeonsim, FYI we've been able to confirm that the bug you saw in your second region is fixed in our development build, and the fix is now available in the nightly builds. The official release containing all these fixes should come out in a couple of weeks.

  • I ran Haplotype Caller with 2.4-9 and I have the issue with incorrect heterozygous calls. Specifically many of the SNPs, are showing up as heterzygous - but the SNP is absent in the bam files and in Unified Genotyper (run on the same bam files).image

    1571 x 639 - 29K
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    We believe this is due to a bug that has been fixed internally. You can either use the latest nightly builds to test whether the problem still occurs for you, or wait for the upcoming 2.5 release.

  • Hi,

    sorry to bother again. I reran my analysis using GenomeAnalysisTK-2.4-9-g532efad and still find various odd calls, e.g. the following:

    chr_8 132298485 . CAGCCAGCA C 7720.52 . AC=41;AF=0.707;AN=58;BaseQRankSum=10.626;DP=202;FS=20.148;InbreedingCoeff=0.5379;MLEAC=41;MLEAF=0.707;MQ=42.23;MQ0=0;MQRankSum=-8.882;QD=7.26;ReadPosRankSum=-2.279 GT:AD:DP:GQ:PL [...] 0/1:0,19:21:50:1451,0,50 [...]

    The sample is clearly homozygous alt but gets called as het. Other samples are:


    Is there some filtering after the call has been made that removes coverage or is there something else that I might be missing?

    I noticed that those issues usually appear at sites with indels. I also notice that the PL difference in the first example is rather low (0 vs 2) while the overal quality at this position is extremly high with 14639.66.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    We are aware that the odd het calls issue still occurs in 2.4-9. The fix for the problem will be available in version 2.5, which we aim to release within the next few days. Otherwise you can try the latest nightly build (see Download page) to check that your issue no longer occurs with our latest dev fixes.

  • ZaagZaag Member

    Could this also lead to a not called heterozygous SNP?

    I have a heterozygous SNP that get's called with a depth of around 600 with the Unifiedgenotyper but not at all with the Haplotypecaller. All steps done with GATK 2.4-9

    I could provide more info, but if 2.5 or the nightly build might solve it I'll try that first.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    You can go ahead and try with the nightly build, sure. If it's still not called then please post the VCF line of the UG call, and an IGV screenshot is always helpful as well.

  • ZaagZaag Member

    Nightly build called it correct. Just to be complete, here is the line from the HaplotypeCaller and an IGV shot.

    chr10 88478529 . G A 1900.77 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-2.806;ClippingRankSum=-1.819;DP=256;FS=8.994;MLEAC=1;MLEAF=0.500;MQ=47.09;MQ0=0;MQRankSum=-1.022;QD=7.42;ReadPosRankSum=0.678 GT:AD:GQ:PL 0/1:109,95:99:1929,0,1564

    Thanks for your answer :)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Glad to see it's working for you! Nice IGV shot :)

  • Hi - I'm getting HaplotypeCaller (version 2.8-1-g932cd3a) calling heterozygotes where there is no evidence in IGV as well. It is also calling a homozygote when it looks like a heterozygote.

    What's the latest on this one? (see attached screenshots for incorrect heterozygote)

    Variant callers 7.bmp
    Variant callers 8.bmp
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @mike_boursnell, could you please upload a test snippet for us to debug locally? We'll try to sort this out asap for you.

  • How do I make a snippet? With samtools view?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    You can do it easily with PrintReads, just specify the interval that you want the snippet from.

Sign In or Register to comment.