The current GATK version is 3.2-2

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Unreported Variant despite high QUAL

Posts: 15Member
edited December 2012

Hello,

I am trying to investigate why a variant (SNP) is not reported in the vcf file when I use the output_mode as "EMIT_VARIANTS_ONLY". However, when I use "EMIT_ALL_SITES" the site shows a very high QUAL score for the variant. Below is the output my region of interest:

17      7578393 .       A       .       11642   .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3927
17      7578396 .       G       .       9145    .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3910
17      7578397 .       T       .       11392   .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3910


The questionable variant is T to A variant at Chr17:7578394 position.

The command I used is:

 java -Xmx4g -jar /usr/local/lib/GenomeAnalysisTK-2.0-35-g2d70733/GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T UnifiedGenotyper -I CL255.ordered.sorted.realigned.bam -o CL255_SNP.vcf --dbsnp dbsnp_135.b37.vcf -stand_call_conf 30 -stand_emit_conf 0 -dcov 5000 -L "17:7,578,336-7,578,451" -out_mode EMIT_ALL_SITES_


I am unable to attach the data since uploading bam seems to be disallowed.

Any help is appreciated. Thanks

Post edited by Geraldine_VdAuwera on
Tagged:

• Posts: 15Member

Sorry the formatting of vcf results was bad in my previous post. Below should be better:

17 7578393 . A . 11642 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3927

17 7578396 . G . 9145 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910

17 7578397 . T . 11392 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910

Hi there,

Re: uploads, the forum only allows uploads of text files and images. For BAM uploads, we have an FTP server (see the FAQs). However, we ask that people only upload BAMs (or BAM snippets) once we've determined that the problem requires it. We'll let you know if it's necessary for your case.

Geraldine Van der Auwera, PhD

• Posts: 678GATK Developer mod

It's definitely covered in that article...

Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

• Posts: 15Member

Hi,

Thanks for the suggestions. I did look into the article: Below are the observations for my data: 1) How many overlapping deletions are there at the position?

There are 41 deletions in 3567 reads which is 0.01 %. This is less than default 0.05% for --max_deletion_fraction

2) What do the base qualities look like for the non-reference bases?

The base qualities for the desired variant bases ("A") are good. Only 0.02% are below 17.

3) What do the mapping qualities look like for the reads with the non-reference bases?

The mapping qualities are in the range of 40-42

4) Are there a lot of alternate alleles?

I used the default value for alternate alleles. Below is the breakup of the alleles seen by IGV: T (Reference) : 1588 : 48% (1125+, 463-) A (Desired Var) : 1922 : 45% (1647+, 275-) C: 0 G: 16 : <1% N: 0 Del: 41 Ins: 14

5) Are you working with SOLiD data?

No. This is 454 data.

I also tried with GATK Ver 2.2.15 with no change except for minor difference in QUAL score for the variant.

Also, please note that variant is getting a very high QUAL score (32767) but is being filtered out later. So my confusion is not based just on observing high frequency in IGV but also that this variant is getting a very very high QUAL score with emit_all_sites. For allele frequency reported in vcf and likelihoods please refer the vcf results snapshot I had pasted earlier. Please let me know if you need more information regarding this issue.