The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Haplotype Caller incorrectly calling Blocks of Variants Heterozygous

aeonsimaeonsim Member
edited November 2012 in Ask the GATK team


I seem to have found a bit of an issue with the Haplotype caller. Looking at variants called with it I've come across a number of small blocks in the genome where the Haplotype caller has called every individual (50 individuals) either RA or RR, which seemed a bit odd considering the population.

Looking at the BAMs and VCFs from SAMtools and the Unified Genotyper these blocks of snps clearly contain all three states as I'd expect RR/RA/AA. Looking at the BAM the reads are of decent quality and have no nearby insertions or deletions to complicate things, and the variants have been called correctly by Samtools and UG.

Any idea what's causing this? Attached is an IGV image showing one of the regions in question, Top VCF is the Haplotype Caller (showing all calls as RA or RR, which is incorrect), Second is UG (showing a mix of RR/RA/AA which is correct). The First BAM shows one of the Animals HC is calling incorrectly as RA for the 5 SNPs shown, while the Second is an Animal that HC is calling RA correctly.

Note these incorrect calls from the HC also passed VQSR.
I believe the version of GATK is one of the 2.1 releases.

Post edited by Geraldine_VdAuwera on

Best Answer


  • rpoplinrpoplin Dev
    edited November 2012

    Hi there,

    This is very curious. Would you be able to share the data from this region with us to help us debug the problem?

    You can use PrintReads (with -L chr9:52,315,067-52,315,498, for example) to extract an interval of your BAM file and upload it to our FTP server:

    Thanks for your help with this!

  • I've uploaded a small region around the problem zone mentioned above (HC-het-issue.tar.gz). I've included the HC and UG calls for the region and a bam containing that region for 4 individuals. Two are Hom Alt's which have been called incorrectly as Het by HC while the remaining two are examples of a Hom Ref and an actual Het.

  • Quick note: The version of GATK used was 2.1-8-g5efb575.

  • Thank you very much for sending these files. Would you be able to also provide the B. taurus reference files you are using?


  • It's the standard BosTau6 ref (same as in IGV) but I'll upload a copy to the FTP for you.

  • @rpoplin said:
    Thank you very much for sending these files. Would you be able to also provide the B. taurus reference files you are using?

    I've uploaded it as bosTau6.fasta.gz, though the ftp app may have played up a little at the end, if it's corrupted let me know and I'll upload it again.


  • rpoplinrpoplin Dev
    edited November 2012

    Hmmm, when I run this now with the latest version (2.2-10) I'm not able to replicate the problem.

    I used this command line:

    java -jar dist/GenomeAnalysisTK.jar -T HaplotypeCaller -L chr9:52,315,067-52,315,498 -I HCissue.bam -R bosTau6.fasta -o output.vcf -debug

    and it calls the block of variants in your screenshot as:

    BE091642881 - hom ref

    BE155317935 - het

    BE432902494 - hom var

    BE832676583 - hom var

    which looks correct to me.

    Can you run this with the latest version and see if it looks correct to you now?

    Thanks for your help,

  • @rpoplin
    I've had a go with 2.2-15 and the results are variable. If I only use a few animals or a very small region of the genome I get the correct results, for example this small target is called correctly.

    -L chr9:52,315,067-52,315,498

    However if I use a larger region say 2Kb then it goes straight back to calling all the variant HetRA.

    java -jar ../../tools/GenomeAnalysisTK-2.2-15-g9214b2f/GenomeAnalysisTK.jar -T HaplotypeCaller -R ../../refs/bosTau6.fasta -I 50-anmls.bam -o 50-anmls.bam.vcf -L chr9:52,314,000-52,316,000

    The attached ss shows this. I'll upload a larger BAM containing 50 individuals and more seq around that region to the file server.
    The ss shows in order VCFs for (all VCFs HC unless stated otherwise):

    • Small targeted region 4 animals: Correct
    • Small targeted region 50 animals.: 3 Vars Correct 4th Wrong!
    • Original Wrong VCF
    • Original Unified Genotyper VCF: Correct
    • Larger Region 2kb using command above: Incorrect (all het)
    • Larger Region 10Kb using same settings as above: Incorrect all Het.

    File is:

  • Thanks, looks like we'll just stick with the Unified Genotyper for a couple more releases then retest the Haplotype caller. This is a rather disconcerting bug though, incorrectly called Het/Hom genotypes can cause significant issues for some of our work.

    Do you have an automated test that looks for Genotypes changing between versions of GATK, and/or compare simple clean SNP (no nearby reps, indels etc) genotype calls between the outputs of UG and HC? It may be worth investigating as it would probably catch cases like this.


  • thondeboerthondeboer Redwood City, CA, USAMember

    I also see this problem crop up in the version I am using which unfortunately is still version 2012.1-2.2.16-2-g9513dd0 since that is the one that is only available for us commercial customers...Can someone indicate if this has been fixed and in which version? The commercial release is due out and I am sure it is some very old version again, so want to make sure we get a fixed version...Not sure why I should bother with the commercial version if the only real advantage is HaplotypeCaller and that was buggy in release 1

    This is what I see:

    <br /> 0/1:0,37:99:2790,0,148<br />
    This was the original line
    <br /> 17 63534781 . C A 2761.77 . AC=1;AF=0.500;AN=2;ActiveRegionSize=295;DP=39;EVENTLENGTH=0;FS=0.000;HaplotypeScore=40.9590;MLEAC=1;MLEAF=0.500;MQ=53.23;MQ0=0;NVH=5;NumHapAssembly=12;NumHapEval=3;QD=70.81;QDE=14.16;TYPE=SNP;extType=SNP GT:AD:GQ:PL 0/1:0,37:99:2790,0,148<br />



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Thon,

    I'm almost certain that this has been fixed since version 2.2. We have made a lot of improvements to the HC since then. Keep in mind that it started out as a highly experimental tool and has been under intense development. These types of bugs will occur less as the tool matures.

    I believe the next commercial release will synch up to our upcoming 2.4 release (but can't guarantee that since it's not my turf).

Sign In or Register to comment.