Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Unreported Variant despite high QUAL

newbie16newbie16 Posts: 15Member
edited December 2012 in Ask the GATK team

Hello,

I am trying to investigate why a variant (SNP) is not reported in the vcf file when I use the output_mode as "EMIT_VARIANTS_ONLY". However, when I use "EMIT_ALL_SITES" the site shows a very high QUAL score for the variant. Below is the output my region of interest:

17      7578393 .       A       .       11642   .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3927
17      7578394 .       T       A       32767.01        .       AC=1;AF=0.500;AN=2;BaseQRankSum=-15.336;DP=3927;Dels=0.10;FS=261.618;HaplotypeScore=6301.9653;MLEAC=1;MLEAF=0.500;MQ=41.68;MQ0=0;MQRankSum=-0.197;QD=8.34;ReadPosRankSum=0.049;SB=-5.709e+03 GT:AD:DP:GQ:PL  0/1:1588,1922:3526:99:32767,0,32767
17      7578395 .       G       A       984.01  .       AC=1;AF=0.500;AN=2;BaseQRankSum=-18.165;DP=3927;Dels=0.08;FS=2960.660;HaplotypeScore=5221.3579;MLEAC=1;MLEAF=0.500;MQ=41.68;MQ0=0;MQRankSum=1.442;QD=0.25;ReadPosRankSum=1.547;SB=-6.519e-03 GT:AD:DP:GQ:PL  0/1:3218,401:3619:99:1014,0,32767
17      7578396 .       G       .       9145    .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3910
17      7578397 .       T       .       11392   .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3910

The questionable variant is T to A variant at Chr17:7578394 position.

The command I used is:

 java -Xmx4g -jar /usr/local/lib/GenomeAnalysisTK-2.0-35-g2d70733/GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T UnifiedGenotyper -I CL255.ordered.sorted.realigned.bam -o CL255_SNP.vcf --dbsnp dbsnp_135.b37.vcf -stand_call_conf 30 -stand_emit_conf 0 -dcov 5000 -L "17:7,578,336-7,578,451" -out_mode EMIT_ALL_SITES_

I am unable to attach the data since uploading bam seems to be disallowed.

Any help is appreciated. Thanks

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • newbie16newbie16 Posts: 15Member

    Sorry the formatting of vcf results was bad in my previous post. Below should be better:

    17 7578393 . A . 11642 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3927

    17 7578394 . T A 32767.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-15.336;DP=3927;Dels=0.10;FS=261.618;HaplotypeScore=6301.9653;MLEAC=1;MLEAF=0.500;MQ=41.68;MQ0=0;MQRankSum=-0.197;QD=8.34;ReadPosRankSum=0.049;SB=-5.709e+03 GT:AD:DP:GQ:PL 0/1:1588,1922:3526:99:32767,0,32767

    17 7578395 . G A 984.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-18.165;DP=3927;Dels=0.08;FS=2960.660;HaplotypeScore=5221.3579;MLEAC=1;MLEAF=0.500;MQ=41.68;MQ0=0;MQRankSum=1.442;QD=0.25;ReadPosRankSum=1.547;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:3218,401:3619:99:1014,0,32767

    17 7578396 . G . 9145 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910

    17 7578397 . T . 11392 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,877Administrator, GATK Developer admin

    Hi there,

    Please have a look at this article and check whether your case is covered. If not, please try running your command again with the latest version of GATK (some upgrades we've made since 2.0 may affect your case) and let us know if the problem persists.

    Re: uploads, the forum only allows uploads of text files and images. For BAM uploads, we have an FTP server (see the FAQs). However, we ask that people only upload BAMs (or BAM snippets) once we've determined that the problem requires it. We'll let you know if it's necessary for your case.

    Geraldine Van der Auwera, PhD

  • ebanksebanks Posts: 678GATK Developer mod

    It's definitely covered in that article...

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • newbie16newbie16 Posts: 15Member

    Hi,

    Thanks for the suggestions. I did look into the article: Below are the observations for my data: 1) How many overlapping deletions are there at the position?

    There are 41 deletions in 3567 reads which is 0.01 %. This is less than default 0.05% for --max_deletion_fraction

    2) What do the base qualities look like for the non-reference bases?

    The base qualities for the desired variant bases ("A") are good. Only 0.02% are below 17.

    3) What do the mapping qualities look like for the reads with the non-reference bases?

    The mapping qualities are in the range of 40-42

    4) Are there a lot of alternate alleles?

    I used the default value for alternate alleles. Below is the breakup of the alleles seen by IGV: T (Reference) : 1588 : 48% (1125+, 463-) A (Desired Var) : 1922 : 45% (1647+, 275-) C: 0 G: 16 : <1% N: 0 Del: 41 Ins: 14

    5) Are you working with SOLiD data?

    No. This is 454 data.

    I also tried with GATK Ver 2.2.15 with no change except for minor difference in QUAL score for the variant.

    Also, please note that variant is getting a very high QUAL score (32767) but is being filtered out later. So my confusion is not based just on observing high frequency in IGV but also that this variant is getting a very very high QUAL score with emit_all_sites. For allele frequency reported in vcf and likelihoods please refer the vcf results snapshot I had pasted earlier. Please let me know if you need more information regarding this issue.

    Thanks for your help.

  • newbie16newbie16 Posts: 15Member

    Hi,

    Thanks for your response. My earlier understanding "--max_deletion_fraction" filter implies deletions % at the locus of interest. I now understand that "--max_deletion_fraction" filter implies reads that span the locus of interest but contain a deletion at any position (not only the position of interest). Would there be any further documentation on this filter other than what we see on the "Unified Genotyper" help page.

    Thanks a lot for helping.

Sign In or Register to comment.