The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

HaplotypeCaller Indel detection

biocaitaobiocaitao Member Posts: 2
edited November 2012 in Ask the GATK team

We find the haplotypecaller is an excellent SNP caller. But recently we got confused for the indel results. We did the target sequencing (total 6 samples with 3 case vs. 3 control). We followed the best practice suggestion except that the VariantRecalibrator (the snp number was around 600 and seems too little for the recalibration). Haploptypecaller detected correctly a SNP but the neighbor deletion was a little strange. From the samtools tview, there is no clear sign for the deletion. We wonder if it came from the de novo assembly by haplotypecaller and is it creditable? Thanks.

The command line:

java -Xmx4g -jar ~/GenomeAnalysisTK-2.1-9-gb90951c/GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I sample1.clean.dedup.recal.bam -I sample2.clean.dedup.recal.bam -I sample3.clean.dedup.recal.bam -I sample4.clean.dedup.recal.bam -I sample5.clean.dedup.recal.bam -I sample6.clean.dedup.recal.bam --dbsnp dbsnp_135.hg19.vcf -L target.interval_list  -stand_call_conf 50.0 -stand_emit_conf 10.0 -o samples_new.raw.snps.indels.vcf

haplotypecaller result:

SNP: 19448410   .    T  G   2126.64 .   AC=6;AF=0.500;AN=12;ActiveRegionSize=135;ClippingRankSum=18.283;DP=976;EVENTLENGTH=0;FS=1076.837;MLEAC=6;MLEAF=0.500;MQ=58.70;MQRankSum=-2.107;NVH=3;NumHapAssembly=17;NumHapEval=13;QD=2.18;QDE=0.73;ReadPosRankSum=-17.858;TYPE=SNP;  GT:GQ:PL    0/1:99:195,0,2945   0/1:99:936,0,6037   0/1:99:354,0,3059   0/1:99:301,0,4595   0/1:99:187,0,2191   0/1:99:203,0,2617
Indel: 19448411 .    GTGGCTCC   G   274.85  .   AC=3;AF=0.250;AN=12;ActiveRegionSize=135;ClippingRankSum=9.296;DP=1019;EVENTLENGTH=-7;FS=328.629;MLEAC=3;MLEAF=0.250;MQ=58.74;MQRankSum=-0.451;NVH=3;NumHapAssembly=17;NumHapEval=13;QD=0.46;QDE=0.15;ReadPosRankSum=-11.624;TYPE=INDEL;    GT:GQ:PL    0/0:99:0,106,14388  0/1:99:200,0,28094  0/0:45:0,45,15261   0/1:50:50,0,22048   0/1:74:74,0,10244   0/0:42:0,42,12913
Post edited by Geraldine_VdAuwera on


  • rpoplinrpoplin Dev Posts: 122 ✭✭✭

    Hmmm, it is hard to really comment without seeing the data. Would you be able to post an IGV screenshot of the region? Also, you can add -debug to your command line and get all of the verbose debugging output from the HaplotypeCaller.

    I hope that helps,

  • biocaitaobiocaitao Member Posts: 2
    edited November 2012

    Thank a lot, rpoplin and Geraldine

    In fact, I reran the whole pipeline using the latest 2.2 bundle. And I am surprised to see the result about the deletion is different with result last time. Actually, the sample 6 changed from “0/0” to “0/1”. Also, I ran “Pindel” for double check. It seems there are sort of different length deletions around the regions in all samples. I am not sure if the regions are too complex to assembly by haplotypeCaller.

    Some detail information: IGV view, command line, new result and the debug info (I am sorry to remove the position)


    java -Xmx4g -jar ~/GenomeAnalysisTK-2.2-5-g3bf5e3f/GenomeAnalysisTK.jar -T HaplotypeCaller -R ucsc.hg19.fasta -I sample1.clean.dedup.recal.bam -I sample2.clean.dedup.recal.bam -I sample3.clean.dedup.recal.bam -I sample4.clean.dedup.recal.bam -I sample5.clean.dedup.recal.bam -I sample6.clean.dedup.recal.bam --dbsnp dbsnp_137.hg19.vcf -L target.interval_list --debug -stand_call_conf 50.0 -stand_emit_conf 10.0 -o samples_new.raw.snps.indels.vcf


    GTGGCTCC        G       415.06  .       AC=4;AF=0.333;AN=12;ActiveRegionSize=135;BaseQRankSum=4.593;ClippingRankSum=6.318;DP=934;EVENTLENGTH=-7;FS=149.291;HaplotypeScore=67.2693;MLEAC=4;MLEAF=0.333;MQ=58.82;MQ0=0;MQRankSum=-0.061;NVH=3;NumHapAssembly=17;NumHapEval=13;QD=0.63;QDE=0.21;ReadPosRankSum=-7.676;TYPE=INDEL;extType=INDEL     GT:AD:GQ:PL     0/0:113,0:63:0,63,14050 0/1:222,15:99:322,0,27417       0/0:123,1:58:0,58,14897 0/1:174,3:44:44,0,21596 0/1:80,4:67:67,0,10023  0/1:105,3:22:22,0,12567

    debug info :

    Post edited by Geraldine_VdAuwera on
    1891 x 729 - 29K
  • rpoplinrpoplin Dev Posts: 122 ✭✭✭

    Hi there,

    There aren't any deletions in the reads you are showing so I'm not sure what can be said about the evidence for or against the variant.

    I wonder if an issue might be that you are running the BaseRecalibrator with a small targeted sequencing bam. Maybe there isn't enough data to accurately estimate the error rate? What happens when you run with the original, unrecalibrated bam file?


Sign In or Register to comment.