Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Aberrant VQSLOD scores

FredericFFredericF Member
edited May 31 in Ask the GATK team

Hello,

I recently reanalyzed a large number (>10,000) of targeted exome sequencing bam files using GATK 4.1.0.0 following the gatk4-germline-snps-indels cromwell workflows (haplotypecaller-gvcf then joint-discovery). (The bam files were produced a couple of years ago using BWA 0.7.10, Picard 1.123 and GATK 3.2-2 in a pipeline that included trimming, alignment (bwa), indel realignment, fixing read mates, marking duplicates and base quality recalibration.)

In the results, I noticed many deletions that felt unexpected. Looking at the bam files for those deletions, I find no read that overlap the deletion (although there are clipped reads that stop at the called deletion point). Looking more closely at the INFO field, I noticed that they have aberrant VQSLOD scores (VQSLOD score > 2e12). There is quite a clear gap in score between the rest of the loci (vqslod < 40) and those that I call aberrant (vqslod > 2e12). All the loci with aberrant vsqlod scores are indels.
Is this a known bug? What could lead to this?

Many thanks for your help!

Post edited by FredericF on

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 3

    Hi @FredericF

    You are using a very old version of GATK Haplotypecaller that we do not support. Please see the updated gatk4-germline-snps-indels workflow with updated tools. Please let me know if you still see the error after updating the tools.

  • FredericFFredericF Member

    Hi Bhanu,

    HaplotypeCaller version used was 4.1.0.0 is it really unsupported already? The gatk4-germline-snps-indels workflow used was forked from github this spring.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @FredericF I misread the question and thought you were using GATK 3.2-2.

    Would you please post 1) vcf records and 2) IGV screenshots of bam and bamout for some of these regions.

  • FredericFFredericF Member

    Hello,
    Thanks for getting back.

    Here is a screenshot of the top of the bam for the region:

    Here is the top of the bamout for the same region:

    Here are the calls for the region. Notice the very extreme VQSLOD.
    8 90958393 . A *,ATTTT,ATTTTTT 167.99 PASS AC=1,0,0;AF=0.029,3.94e-05,3.94e-05;AN=2;BaseQRankSum=-6.339;DP=625604;ExcessHet=52.0777;FS=77.979;InbreedingCoeff=-0.0146;MLEAC=758,1,1;MLEAF=0.03,3.94e-05,3.94e-05;MQ=59.28;MQRankSum=-5.336;QD=0;ReadPosRankSum=-4.127;SOR=17.535;VQSLOD=3.42415e+18;culprit=SOR GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:82,25,0,0:107:99:0|1:90958392_TAG_T:704,0,3839,951,3914,4865,951,3914,4865,4865:90958392
    8 90958394 . G *,T 1959.05 VQSRTrancheSNP99.95to100.00 AC=1,0;AF=0.028,0.0005911;AN=2;BaseQRankSum=-6.147;DP=626454;ExcessHet=15.8314;FS=77.979;InbreedingCoeff=-0.0116;MLEAC=740,13;MLEAF=0.029,0.0005123;MQ=59.35;MQRankSum=-4.96;QD=0.02;ReadPosRankSum=-4.123;SOR=16.705;VQSLOD=-209.9;culprit=SOR GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:82,25,0:107:99:0|1:90958392_TAG_T:704,0,3839,951,3914,4865:90958392
    8 90958396 . TGACCATAATCATCA TTAATCATCA,TATAATCATCA,TTCATCA,T 234379 PASS AC=1,0,0,0;AF=0.027,7.881e-05,3.94e-05,0.000394;AN=2;BaseQRankSum=-6.068;DP=659901;ExcessHet=45.945;FS=49.361;InbreedingCoeff=-0.0282;MLEAC=682,2,1,7;MLEAF=0.027,7.881e-05,3.94e-05,0.0002758;MQ=59.44;MQRankSum=-6.855;QD=2.02;ReadPosRankSum=-1.705;SOR=18.552;VQSLOD=5.15068e+18;culprit=SOR GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:128,25,0,0,0:153:99:0|1:90958392_TAG_T:563,0,6415,951,6490,7442,951,6490,7442,7442,951,6490,7442,7442,7442:90958392
    8 90958397 . GACCATAATCATCA *,TACCATAATCATCA 170.45 PASS AC=1,0;AF=0.027,3.94e-05;AN=2;BaseQRankSum=-4.462;DP=672555;ExcessHet=47.486;FS=52.177;InbreedingCoeff=-0.0225;MLEAC=718,1;MLEAF=0.028,3.94e-05;MQ=59.34;MQRankSum=-6.259;QD=0;ReadPosRankSum=-0.405;SOR=18.553;VQSLOD=5.92821e+18;culprit=SOR GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:128,25,0:153:99:0|1:90958392_TAG_T:563,0,6415,951,6490,7442:90958392
    8 90958400 rs200564603 C *,T 2101.95 VQSRTrancheSNP99.95to100.00 AC=1,0;AF=0.026,0.0003152;AN=2;BaseQRankSum=-3.06;DB;DP=660349;ExcessHet=25.8301;FS=52.177;InbreedingCoeff=-0.0189;MLEAC=678,8;MLEAF=0.027,0.0003152;MQ=59.44;MQRankSum=-5.877;QD=0.02;ReadPosRankSum=0.921;SOR=18.476;VQSLOD=-243.2;culprit=SOR GT:AD:DP:GQ:PGT:PID:PL:PS 0|1:128,25,0:153:99:0|1:90958392_TAG_T:563,0,6415,951,6490,7442:90958392

    From the original bam file, I don't see a lot going on. From the bamout, I see a hard region to assess. If this is not simply a bug, I wonder where the confidence could come from to provide such scores.

    Many thanks for your help!

    Frederic

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @FredericF

    I spoke to our dev team and this is what they recommended:
    Try trouble shooting by making some plots by using --rscript_file option of VariantRecalibrator which would give negative/positive model distributions for different annotations and seeing if these distributions make sense for given annotation values.

Sign In or Register to comment.