The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

HaplotypeCaller misses a true variant

flapaflapa BolognaMember Posts: 3

I'm updating my pipeline for exome sequencing analysis, so I'm experiencing the HaplotypeCaller capabilities! I have analyzed the same sample with the UnifiedGenotyper walker and the HC one and I have examined the differences between the two output vcf files and I had a very bad finding... HC failed to find a true novel variant!! I know that this is a true variants because I validated that with Sanger sequencing after the first calling with UG.

I have run UG using GATK version 1.6-11-g3b2fab9. This is the VCF line of the variant:

chr7 45123943 . A T 3436.17 PASS AC=2;AF=1.00;AN=2;BaseQRankSum=2.043;DP=114;Dels=0.00;FS=2.678;HRun=1;HaplotypeScore=0.0000;MQ=42.18;MQ0=1;MQRankSum=2.152;QD=30.14;ReadPosRankSum=-0.781;SB=-1010.47 GT:AD:DP:GQ:PL 1/1:8,105:114:99:3436,253,0

I have run HC using GATK version 2.7-4-g6f46d11 both in a single- and in a multi-sample manner but not the shadow of this variant in the VCF output..
I also noticed that together with this novel variant, HC lost other two variants upstream the first; these are the VCF lines:

chr7 45123881 rs61740891 C T 654.25 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=0.205;DB;DP=43;DS;Dels=0.00;FS=65.862;HRun=0;HaplotypeScore=2.2312;MQ=30.27;MQ0=1;MQRankSum=3.254;QD=15.22;ReadPosRankSum=-3.921;SB=-3.02 GT:AD:DP:GQ:PL 0/1:18,25:43:99:684,0,176

chr7 45123888 . C T 161.90 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=-2.293;DP=26;DS;Dels=0.00;FS=49.656;HRun=2;HaplotypeScore=0.0000;MQ=23.81;MQ0=1;MQRankSum=0.425;QD=6.23;ReadPosRankSum=-3.821;SB=-3.00 GT:AD:DP:GQ:PL 0/1:17,9:26:99:192,0,249

How is it possible?

Many thanks in advance

Best, Flavia

Best Answers


  • flapaflapa BolognaMember Posts: 3

    Hi Geraldine,

    I've just uploaded my data in the FTP server in a file named flapa_data.tar.gz; I created a BAM file for the whole chr7 in which the non-called variants fall.

    I hope this can be helpful!


  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,970 admin

    I was able to reproduce your issue, so I'm now passing this on to the devs for in-depth debugging.

    Geraldine Van der Auwera, PhD

  • ebanksebanks Broad InstituteMember, Administrator, Broadie, Moderator, Dev Posts: 698 admin

    Hi Flavia,

    I've taken a look at your example and would like to explain what's happening. If you look carefully at the HC call in that region you'll notice that it assembles it into a very large (120bp) deletion (with 90% of your reads supporting that call). The HC believes that those "SNPs" aren't real, but rather are artifacts from a misalignment around the deletion.

    I've attached a screenshot of your data that illustrates it quite nicely. The upper half shows the nice clean HC re-alignments around the deletion. The lower half shows the original reads; notice that the coverage drops dramatically over the deletion and that those "SNPs" occur near the breakpoints. These are classic signs of mis-alignments.

    Is it possible that the Sanger sequencing validation could be interpreted in this way too?

    1150 x 682 - 29K

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • flapaflapa BolognaMember Posts: 3

    Hi Eric,

    thank you so much for your very clear answer.
    The gene sequence is very repetitive; so, after your explanation, I think that also the Sanger sequencing could be interpreted in this way.
    Now I'm trying to perform a more specific PCR and I'll let you know if I'll reply the validation.

    Thanks for yor help!

Sign In or Register to comment.