It looks like you're new here. If you want to get involved, click one of these buttons!
Hello,
I am trying to investigate why a variant (SNP) is not reported in the vcf file when I use the output_mode as "EMIT_VARIANTS_ONLY". However, when I use "EMIT_ALL_SITES" the site shows a very high QUAL score for the variant. Below is the output my region of interest:
17 7578393 . A . 11642 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3927
17 7578394 . T A 32767.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-15.336;DP=3927;Dels=0.10;FS=261.618;HaplotypeScore=6301.9653;MLEAC=1;MLEAF=0.500;MQ=41.68;MQ0=0;MQRankSum=-0.197;QD=8.34;ReadPosRankSum=0.049;SB=-5.709e+03 GT:AD:DP:GQ:PL 0/1:1588,1922:3526:99:32767,0,32767
17 7578395 . G A 984.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-18.165;DP=3927;Dels=0.08;FS=2960.660;HaplotypeScore=5221.3579;MLEAC=1;MLEAF=0.500;MQ=41.68;MQ0=0;MQRankSum=1.442;QD=0.25;ReadPosRankSum=1.547;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:3218,401:3619:99:1014,0,32767
17 7578396 . G . 9145 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910
17 7578397 . T . 11392 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910
The questionable variant is T to A variant at Chr17:7578394 position.
The command I used is:
java -Xmx4g -jar /usr/local/lib/GenomeAnalysisTK-2.0-35-g2d70733/GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T UnifiedGenotyper -I CL255.ordered.sorted.realigned.bam -o CL255_SNP.vcf --dbsnp dbsnp_135.b37.vcf -stand_call_conf 30 -stand_emit_conf 0 -dcov 5000 -L "17:7,578,336-7,578,451" -out_mode EMIT_ALL_SITES_
I am unable to attach the data since uploading bam seems to be disallowed.
Any help is appreciated. Thanks
ebanks
Posts: 475 mod
Answers
Sorry the formatting of vcf results was bad in my previous post. Below should be better:
17 7578393 . A . 11642 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3927
17 7578394 . T A 32767.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-15.336;DP=3927;Dels=0.10;FS=261.618;HaplotypeScore=6301.9653;MLEAC=1;MLEAF=0.500;MQ=41.68;MQ0=0;MQRankSum=-0.197;QD=8.34;ReadPosRankSum=0.049;SB=-5.709e+03 GT:AD:DP:GQ:PL 0/1:1588,1922:3526:99:32767,0,32767
17 7578395 . G A 984.01 . AC=1;AF=0.500;AN=2;BaseQRankSum=-18.165;DP=3927;Dels=0.08;FS=2960.660;HaplotypeScore=5221.3579;MLEAC=1;MLEAF=0.500;MQ=41.68;MQ0=0;MQRankSum=1.442;QD=0.25;ReadPosRankSum=1.547;SB=-6.519e-03 GT:AD:DP:GQ:PL 0/1:3218,401:3619:99:1014,0,32767
17 7578396 . G . 9145 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910
17 7578397 . T . 11392 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi there,
Please have a look at this article and check whether your case is covered. If not, please try running your command again with the latest version of GATK (some upgrades we've made since 2.0 may affect your case) and let us know if the problem persists.
Re: uploads, the forum only allows uploads of text files and images. For BAM uploads, we have an FTP server (see the FAQs). However, we ask that people only upload BAMs (or BAM snippets) once we've determined that the problem requires it. We'll let you know if it's necessary for your case.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •It's definitely covered in that article...
Eric Banks, PhD -- Group Leader, Methods Development, MPG, Broad Institute of Harvard and MIT
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi,
Thanks for the suggestions. I did look into the article: Below are the observations for my data: 1) How many overlapping deletions are there at the position?
2) What do the base qualities look like for the non-reference bases?
3) What do the mapping qualities look like for the reads with the non-reference bases?
4) Are there a lot of alternate alleles?
5) Are you working with SOLiD data?
I also tried with GATK Ver 2.2.15 with no change except for minor difference in QUAL score for the variant.
Also, please note that variant is getting a very high QUAL score (32767) but is being filtered out later. So my confusion is not based just on observing high frequency in IGV but also that this variant is getting a very very high QUAL score with emit_all_sites. For allele frequency reported in vcf and likelihoods please refer the vcf results snapshot I had pasted earlier. Please let me know if you need more information regarding this issue.
Thanks for your help.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi,
Thanks for your response. My earlier understanding "--max_deletion_fraction" filter implies deletions % at the locus of interest. I now understand that "--max_deletion_fraction" filter implies reads that span the locus of interest but contain a deletion at any position (not only the position of interest). Would there be any further documentation on this filter other than what we see on the "Unified Genotyper" help page.
Thanks a lot for helping.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •