Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK4.1.0.0 HaplotypeCaller called wrong variant


I got a strange issue. I was looking for EGFR C797S in my sample using gatk HaplotypeCaller. But instead HaplotypeCaller reported C797G:

chr7 55249091 COSM6493936 T G 9228.6 . AC=1;AF=0.500;AN=2;ANN=G|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_005228.3|protein_coding|20/28|c.2389T>G|p.Cys797Gly|2635/5600|2389/3633|797/1210||;BaseQRankSum=-8.760e-01;DP=11630;ExcessHet=3.0103;FS=119.585;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.856;QD=1.33;ReadPosRankSum=20.47;SOR=8.736;AA=p.C797G;CDS=c.2389T>G;CNT=1;GENE=EGFR;STRAND=+ GT:AD:DP:GQ:PL 0/1:6078,878:8673:99:9236,0,203073

But from the bam file before HaplotypeCaller and the bam output from HaplotypeCaller all show the alt is A in igv, not the G called by HaplotypeCaller. The software coming with the sequencer called C797S correctly.

I used the all default settings except "--max-reads-per-alignment-start 0 --disable-read-filter NotDuplicateReadFilter" to run HaplotypeCaller because we work with amplicon-based data.

Any suggestion?

Thanks a lot for the help!



  • yingchen69yingchen69 nanjingMember

    Just an update: I ran HaplotypeCaller with more non-default settings "--max-rea
    ds-per-alignment-start 0 --disable-read-filter NotDuplicateReadFilter --max-num-haplotypes-in-population 1600 --kmer-size 10 --kmer-size 15 --kmer-size 25", and now it output correct variant call (C797S):

    chr7 55249091 rs1057519861 T A 174207.77 PASS AC=1;AF=0.500;AN=2;ANN=A|missense_variant|MODERATE|EGFR|EGFR|transcript|NM_005228.3|protein_coding|20/28|c.2389T>A|p.Cys797Ser|2635/5600|2389/3633|797/1210||,A|non_coding_transcript_exon_variant|MODIFIER|EGFR-AS1|EGFR-AS1|transcript|NR_047551.1|pseudogene|2/2|n.1173A>T||||||;BaseQRankSum=-8.430e-01;DB;DP=12569;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.99;MQRankSum=-4.950e-01;QD=16.45;ReadPosRankSum=0.527;SOR=0.732 GT:AD:DP:GQ:PL0/1:5425,5165:10605:99:174236,0,186486

    Would you please help me to understand why using "--max-reads-per-alignment-start 0 --disable-read-filter NotDuplicateReadFilter" would output wrong alt as G, but using "--max-reads-per-alignment-start 0 --disable-read-filter NotDuplicateReadFilter --max-num-haplotypes-in-population 1600 --kmer-size 10 --kmer-size 15 --kmer-size 25" output the correct alt as A?

    Thanks a lot,


  • yingchen69yingchen69 nanjingMember

    Hi GATK team,

    Any suggestion?



  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @yingchen69

    Sorry for the delay. We will look into this immediately and i will get back to you with an answer soon.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin


    Regarding the HC call on amplicon sequencing - we don't have best practices for amplicon sequencing, and because the depth is so high reconstructing haplotypes can be tricky sometimes.
    One potential thing to do is getting the latest nightly docker build (from today) and running it with --recover-all-dangling-branchesoption enabled.

Sign In or Register to comment.