Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller misses a true variant

flapaflapa BolognaMember

I'm updating my pipeline for exome sequencing analysis, so I'm experiencing the HaplotypeCaller capabilities! I have analyzed the same sample with the UnifiedGenotyper walker and the HC one and I have examined the differences between the two output vcf files and I had a very bad finding... HC failed to find a true novel variant!! I know that this is a true variants because I validated that with Sanger sequencing after the first calling with UG.

I have run UG using GATK version 1.6-11-g3b2fab9. This is the VCF line of the variant:

chr7 45123943 . A T 3436.17 PASS AC=2;AF=1.00;AN=2;BaseQRankSum=2.043;DP=114;Dels=0.00;FS=2.678;HRun=1;HaplotypeScore=0.0000;MQ=42.18;MQ0=1;MQRankSum=2.152;QD=30.14;ReadPosRankSum=-0.781;SB=-1010.47 GT:AD:DP:GQ:PL 1/1:8,105:114:99:3436,253,0

I have run HC using GATK version 2.7-4-g6f46d11 both in a single- and in a multi-sample manner but not the shadow of this variant in the VCF output..
I also noticed that together with this novel variant, HC lost other two variants upstream the first; these are the VCF lines:

chr7 45123881 rs61740891 C T 654.25 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=0.205;DB;DP=43;DS;Dels=0.00;FS=65.862;HRun=0;HaplotypeScore=2.2312;MQ=30.27;MQ0=1;MQRankSum=3.254;QD=15.22;ReadPosRankSum=-3.921;SB=-3.02 GT:AD:DP:GQ:PL 0/1:18,25:43:99:684,0,176

chr7 45123888 . C T 161.90 PASS AC=1;AF=0.50;AN=2;BaseQRankSum=-2.293;DP=26;DS;Dels=0.00;FS=49.656;HRun=2;HaplotypeScore=0.0000;MQ=23.81;MQ0=1;MQRankSum=0.425;QD=6.23;ReadPosRankSum=-3.821;SB=-3.00 GT:AD:DP:GQ:PL 0/1:17,9:26:99:192,0,249

How is it possible?

Many thanks in advance

Best, Flavia

Best Answers


  • flapaflapa BolognaMember

    Hi Geraldine,

    I've just uploaded my data in the FTP server in a file named flapa_data.tar.gz; I created a BAM file for the whole chr7 in which the non-called variants fall.

    I hope this can be helpful!


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I was able to reproduce your issue, so I'm now passing this on to the devs for in-depth debugging.

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi Flavia,

    I've taken a look at your example and would like to explain what's happening. If you look carefully at the HC call in that region you'll notice that it assembles it into a very large (120bp) deletion (with 90% of your reads supporting that call). The HC believes that those "SNPs" aren't real, but rather are artifacts from a misalignment around the deletion.

    I've attached a screenshot of your data that illustrates it quite nicely. The upper half shows the nice clean HC re-alignments around the deletion. The lower half shows the original reads; notice that the coverage drops dramatically over the deletion and that those "SNPs" occur near the breakpoints. These are classic signs of mis-alignments.

    Is it possible that the Sanger sequencing validation could be interpreted in this way too?

  • flapaflapa BolognaMember

    Hi Eric,

    thank you so much for your very clear answer.
    The gene sequence is very repetitive; so, after your explanation, I think that also the Sanger sequencing could be interpreted in this way.
    Now I'm trying to perform a more specific PCR and I'll let you know if I'll reply the validation.

    Thanks for yor help!

Sign In or Register to comment.