The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Unreported Variant despite high QUAL

Posts: 42
edited December 2012

Hello,

I am trying to investigate why a variant (SNP) is not reported in the vcf file when I use the output_mode as "EMIT_VARIANTS_ONLY". However, when I use "EMIT_ALL_SITES" the site shows a very high QUAL score for the variant. Below is the output my region of interest:

17      7578393 .       A       .       11642   .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3927
17      7578396 .       G       .       9145    .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3910
17      7578397 .       T       .       11392   .       AN=2;DP=3927;MQ=41.68;MQ0=0     GT:DP   0/0:3910


The questionable variant is T to A variant at Chr17:7578394 position.

The command I used is:

 java -Xmx4g -jar /usr/local/lib/GenomeAnalysisTK-2.0-35-g2d70733/GenomeAnalysisTK.jar -R human_g1k_v37.fasta -T UnifiedGenotyper -I CL255.ordered.sorted.realigned.bam -o CL255_SNP.vcf --dbsnp dbsnp_135.b37.vcf -stand_call_conf 30 -stand_emit_conf 0 -dcov 5000 -L "17:7,578,336-7,578,451" -out_mode EMIT_ALL_SITES_
`

I am unable to attach the data since uploading bam seems to be disallowed.

Any help is appreciated.
Thanks

Post edited by Geraldine_VdAuwera on
Tagged:

• Posts: 42

Sorry the formatting of vcf results was bad in my previous post. Below should be better:

17 7578393 . A . 11642 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3927

17 7578396 . G . 9145 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910

17 7578397 . T . 11392 . AN=2;DP=3927;MQ=41.68;MQ0=0 GT:DP 0/0:3910

Hi there,

Re: uploads, the forum only allows uploads of text files and images. For BAM uploads, we have an FTP server (see the FAQs). However, we ask that people only upload BAMs (or BAM snippets) once we've determined that the problem requires it. We'll let you know if it's necessary for your case.

Geraldine Van der Auwera, PhD

It's definitely covered in that article...

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Posts: 42

Hi,

Thanks for the suggestions. I did look into the article: Below are the observations for my data:
1) How many overlapping deletions are there at the position?

There are 41 deletions in 3567 reads which is 0.01 %. This is less than default 0.05% for --max_deletion_fraction

2) What do the base qualities look like for the non-reference bases?

The base qualities for the desired variant bases ("A") are good. Only 0.02% are below 17.

3) What do the mapping qualities look like for the reads with the non-reference bases?

The mapping qualities are in the range of 40-42

4) Are there a lot of alternate alleles?

I used the default value for alternate alleles. Below is the breakup of the alleles seen by IGV:

T (Reference) : 1588 : 48% (1125+, 463-)
A (Desired Var) : 1922 : 45% (1647+, 275-)
C: 0
G: 16 : <1%
N: 0
Del: 41
Ins: 14

5) Are you working with SOLiD data?

No. This is 454 data.

I also tried with GATK Ver 2.2.15 with no change except for minor difference in QUAL score for the variant.

Also, please note that variant is getting a very high QUAL score (32767) but is being filtered out later. So my confusion is not based just on observing high frequency in IGV but also that this variant is getting a very very high QUAL score with emit_all_sites. For allele frequency reported in vcf and likelihoods please refer the vcf results snapshot I had pasted earlier. Please let me know if you need more information regarding this issue.