Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

HaplotypeCaller generate incorrect heterozygous Calls

nkausthunkausthu IndiaMember
edited August 29 in Ask the GATK team

Hi,
We have used GATK 3.6 and GATK 4.1 haplotype caller, to generate the variants, unfortunately, we are getting lots of incorrect heterozygous calls. Even only one read supports the variant it is considered as a heterozygous call. It would be great if someone can suggest what can be done in this scenario.

Chr Start End Ref Alt Zygosity Read frequency Read No Genotype Quality
1 27101249 27101249 - GGCGGGGCCCTGGGG Het 164,1 165 99
6 33141690 33141690 - GTAGGGTCCACGGGGTCAGCGGG Het 158,1 163 99
10 71906033 71906033 - G Het 150,9 160 91
11 66411384 66411384 C T Het 144,5 155 64
11 66411384 66411384 C T Het 144,5 155 64
17 79880575 79880575 G T Het 147,8 155 83
18 23807079 23807079 - ACTTTGGTCA Het 158,4 162 99
19 36276206 36276206 - GT Het 144,9 153 99
19 50340140 50340140 - GGGGGACCGGGCCCCGGGGA Het 154,3 157 99

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @nkausthu

    Can you please provide GATK 4.1 haplotypecaller bamout using the following instructions.

  • nkausthunkausthu IndiaMember

    Hi,
    we have transferred the GATK 4.1 haplotypecaller bamout and the zipped folder name is KMC_MANIPAL_GATK4_1

  • nkausthunkausthu IndiaMember

    Hi @bhanuGandham

    we had transferred the bamout, any update on this ?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @nkausthu

    I am looking into this and will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @nkausthu

    1) I am looking at your data right now. Can you please clarify what the two files bamout_1_v4.1.bam and bamout_2_v4.1.bam are? And how are they different?
    2) Also please share with us the vcf file generated by Haplotypecaller using the instructions provided here: https://software.broadinstitute.org/gatk/guide/article?id=1894

  • nkausthunkausthu IndiaMember

    Hi @bhanuGandham
    The two bamout files are for two different regions. The variants belong to those regions are given in variants_selected.txt file. We will share the vcf file soon

  • nkausthunkausthu IndiaMember

    Hi @bhanuGandham
    we have transferred the vcf file '80013_S1.zip.'

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @nkausthu

    I am looking into this and will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited September 20

    Hi @nkausthu

    The vcf you shared with us was generated using GATK3.6 version.

    Please share with us only data that was generated by GATK4 and preferably the latest GATK4.1.3.0. We do not support GATK3 anymore.

  • nkausthunkausthu IndiaMember

    Hi @bhanuGandham
    we have transferred VCF file generated with GATK4.13 'gatk_kmc_vcf.zip'

  • nkausthunkausthu IndiaMember

    Hi @bhanuGandham
    Any update on this?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited October 20

    Hi @nkausthu

    Sorry about the delay in getting back to you. We are facing a high volume of questions. We will get back to you shortly.

    Post edited by bhanuGandham on
  • nkausthunkausthu IndiaMember

    Hi @bhanuGandham,
    Any update on this ?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @nkausthu

    In the vcf you shared with us I am unable to find any records of the variants you mentioned above:

    Chr Start End Ref Alt Zygosity Read frequency Read No Genotype Quality
    1 27101249 27101249 - GGCGGGGCCCTGGGG Het 164,1 165 99
    6 33141690 33141690 - GTAGGGTCCACGGGGTCAGCGGG Het 158,1 163 99
    10 71906033 71906033 - G Het 150,9 160 91
    11 66411384 66411384 C T Het 144,5 155 64
    11 66411384 66411384 C T Het 144,5 155 64
    17 79880575 79880575 G T Het 147,8 155 83
    18 23807079 23807079 - ACTTTGGTCA Het 158,4 162 99
    19 36276206 36276206 - GT Het 144,9 153 99
    19 50340140 50340140 - GGGGGACCGGGCCCCGGGGA Het 154,3 157 99

    Can you please share records of variants from the vcf that have the erroneous het calls.

  • nkausthunkausthu IndiaMember

    Hi @bhanuGandham

    We have again transferred the new folder "NOV_2_VCF_4_1.zip" which contains VCF file called using GATK 4.1 and the corresponding erroneous het calls in text file. Please use this version.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @nkausthu

    I am looking into this. In the mean time can you please post the exact version of GATK you used to create this vcf and the exact command.

  • nkausthunkausthu IndiaMember

    Hi @bhanuGandham

    GATK version: gatk-4.1.3.0
    gatk --java-options "-Xmx20g" HaplotypeCaller -R /mnt/exome/ReferenceFiles/human_g1k_v37.fasta -I 19CT037976-19IN000950-M_80013_S1_recal.bam -L sureselect_crev2.b37.pad50.bed -O VCF_4_1.vcf

  • I have a similar problem with GATK 4.1.1.0. Haplotype caller supports calls with 2 reads coverage.

    There was no issue on the same sample in 4.0.12.1 version (older version supports 100 000 less calls for exome)

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL

    chr1 136391 . T G 220.6 PASS AC=1;AF=0.5;AN=2;BaseQRankSum=3.03;DP=99;ExcessHet=3.0103;FS=4.195;MLEAC=1;MLEAF=0.5;MQ=40;MQRankSum=-0.337;QD=2.23;ReadPosRankSum=-0.876;SOR=1.424;GT:AD:DP:GQ:PL 0/1:82,17:99:99:228,0,2142
    chr1 768923 . C A 37.28 PASS AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=18.64;SOR=0.693; GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 770988 . A G 37.28 PASS AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=43;QD=18.64;SOR=0.693; GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 802525 . G T 37.28 PASS AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=48.01;QD=18.64;SOR=0.693; GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 821224 . A G 37.28 PASS AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=60;QD=18.64;SOR=0.693; GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 840279 . A G 37.28 PASS AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=50.99;QD=18.64;SOR=0.693; GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 861154 . T C 37.28 PASS AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=1;MLEAF=0.5;MQ=48.01;QD=18.64;SOR=0.693; GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0

    same calls in older version which won't pass hardfiltering:

    chr1 136391 . T G 191.77 HDFLT_MQ40.00;HDFLT_QD2.00 AC=1;AF=0.5;AN=2;BaseQRankSum=2.12;DP=100;ExcessHet=3.0103;FS=2.41;MLEAC=1;MLEAF=0.5;MQ=39.52;MQRankSum=-1.582;QD=1.92;ReadPosR
    chr1 768923 . C A 21.77 HDFLT_QUAL30.00 AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=10.88;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 770988 . A G 21.77 HDFLT_QUAL30.00 AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=43;QD=10.88;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 802525 . G T 21.77 HDFLT_QUAL30.00 AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=48.01;QD=10.88;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 821224 . A G 21.77 HDFLT_QUAL30.00 AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=60;QD=10.88;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 840279 . A G 21.77 HDFLT_QUAL30.00 AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=50.99;QD=10.88;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0
    chr1 861154 . T C 21.77 HDFLT_QUAL30.00 AC=2;AF=1;AN=2;DP=2;ExcessHet=3.0103;FS=0;MLEAC=2;MLEAF=1;MQ=48.01;QD=10.88;SOR=0.693 GT:AD:DP:GQ:PL 1/1:0,2:2:6:49,6,0

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @nkausthu and @Alexandra_Zhayvoron I am checking with the dev team about this and will get back to you shortly.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @nkausthu and @Alexandra_Zhayvoron

    This does look weird. We want to investigate this further. Can you please share with the input bam and the bamout corresponding to the erroneous vcf records.

    Note: @nkausthu I know I asked this data from you earlier but in the back and forth I am not sure which files correspond to the latest vcf file and I want to be sure we are looking at the right data. So if you could please send it again that would be very helpful.

    Please follow these directions for sharing data: https://software.broadinstitute.org/gatk/guide/article?id=1894

  • nkausthunkausthu IndiaMember

    hi @bhanuGandham

    We have uploaded the file "KMC_VCF_Nov_08_19.zip" to the FTP server.

    It contains three snippets extracted from the processed bam file, with the offending chr_position being the bam file name. The vcf from the main bam file is also included in the file, along with the commands which were used to run the program.

    Kindly look into the files. If anything else is needed let us know!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited November 25

    Hi all,

    Sorry for the delay in getting back to you. I checked with the dev team and this is what they had to say:
    So the issue here is that the subset of reads being used to make calls and genotype is different from the subset of reads being used to calculate annotations. The difference is particularly striking for reads with a leading insertion, which is what was causing the particularly surprising AD values. In essence, there was significant evidence of an insertion in the data, but all the evidence was from the beginning of reads with leading insertions, which were not being counted when calculating the AD annotation.

    These are issues the gatk team is looking into fixing. I am unclear on what the timeline to get this fixed is. I will try and get that information and share it here shortly.

  • nkausthunkausthu IndiaMember

    Thank you so much for the reply and looking forward to hearing from you as soon as possible.

  • TalDahanTalDahan Member

    I am experiencing the same issue with my samples. HapCaller calls heterozygoutes on one read out of many. Looking forward to you answer. Thank you.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited December 2

    Hi all,

    Here are the github issue tickets for the following issues:
    1. Issue that makes the reads used for genotying and annotating inconsistent: https://github.com/broadinstitute/gatk/issues/5434
    2. Bug in the ReadPositionRankSum annotation: https://github.com/broadinstitute/gatk/issues/6294

    You can follow the links to keep track of the progress made to these issues.

Sign In or Register to comment.