To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

HaplotypeCaller INDEL calling, inconsistent with bamfile alignment

I have WES data of 235 samples, aligned using bwa aln. I followed GATK's Best Pratice for marking duplicates, realignment around INDEL, and BQSR until I got the recalibrated bamfiles. All these are done with GATK 2.7-2.
After that I generated gvcf file from the recalibrated bamfiles, using HaplotypeCaller from GATK 3.1-1, followed by GenotypeGVCFs and VQSR.

I came across one INDEL, which passes VQSR,

The variant is
chr14 92537378 . G GC 11330.02 PASS AC=11;AF=0.023;AN=470;BaseQRankSum=0.991;ClippingRankSum=-6.820e-01;DP=

Indivudual genotypes seem very good. To name a few:
1800 0/0:65,0:65:99:0,108,1620 0/0:86,0:86:99:0,120,1800 0/1:23,25:48:99:1073,0,1971 0/1:51,39:90:99:1505,0,4106

But when I checked the variants in IGV, something strange popped out.

The pictures (and complete post) can be view here:

Judging from the alignment visualized by IGV, there are no insertions at that site, but an SNP in the next position, in all samples. But HaplotypeCaller calls G->GC insertion in some samples while not in other samples.
My questions are:
1) In variant calling HaplotypeCaller would do a local de-novo assembly in some region. Is the inconsistency between bamfiles and gvcf because of the re-assembly?
2) Why is the insertion called in some samples but not others, although the position in all samples looks similar?
3) There are other variants called adjacent or near this site, but later filtered by VQSR. Does that indicate the variants from this region cannot be trusted?
chr14 92537353 . C CG 303342.41 VQSRTrancheINDEL0.00to90.00+
chr14 92537378 . G GC 11330.02 PASS
chr14 92537379 rs12896583 T TGCTGCTGCTGCTGC,C 13606.25 VQSRTrancheINDEL0.00to90.00+
chr14 92537385 rs141993435 CTTT C 314.64 VQSRTrancheINDEL0.00to90.00+
chr14 92537387 rs12896588 T G 4301.60 VQSRTrancheSNP99.90to100.00
chr14 92537388 rs12896589 T C 4300.82 VQSRTrancheSNP99.90to100.00
chr14 92537396 . G GC,GCTGCTGCTGCTGCTGCTGCTGCTGCTGCTGC 4213.61 VQSRTrancheINDEL0.00to90.00+

3) I used GATK 2.7 for processing before calling gvcf files. Any possibility that if I change to 3.2, the bamfiles would look more similar to gvcf result? I have tried GATK3.2-2 for generating gvcf from GATK 2.7 bamfiles, the results are the same.


Sign In or Register to comment.