The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# [Screenshot + info] SNP visible in IGV but is not called by UnifiedGenotyper

Member
edited April 2013

Greetings!

First of all, thank you for a truly great toolkit! It is no doubt the best one out there.

Now, I have a question regarding visualization of a SNP that is not called by UG but looks convincing in IGV. Yes, I've looked at the FAQ page gatkforums.broadinstitute.org/discussion/1235/why-didnt-the-unified-genotyper-call-my-snp-i-can-see-it-right-there-in-igv but I'm still not completely convinced that this is a false positive.

The BAM files have gone through the Best Practices workflow prior to SNP calling. Calling was done using UG with subsequent recalibration steps, where I followed the guidelines under gatkforums.broadinstitute.org/discussion/1259/what-vqsr-training-sets-arguments-should-i-use-for-my-specific-project. SNP calling was done using GATK 2.4-9.

Below is a screenshot from IGV showing the SNP call:

Fullsize here: s24.postimg.org/sepow851v/igv_snp.png

The average mapping quality for the reads that include the SNP is 50 and the average base quality at the locus of the SNP is 28.7 (not including 4 positions where base quality is below 10). These values are calculated from the values shown by IGV

Are these values really too low to not confidently call this SNP? I mean a base quality of 28.7 means a probability of 99.87% that the base call is correct. Isn't that good enough?

Tagged:

Ah, I see. Two things:

1. The MQ is the quadratic mean of the mapping quals for all reads at the sites; its value shouldn't be interpreted like a classic phred-scaled mapq value. To give you a sense of scale, when using hard filters instead of VQSR, it is common to filter out variants with MQ < 40. So here, if you have a lot of variants with great MQ, it's possible that the model determined that 48 is below acceptable relative to the rest. Having a small call set (as warned) may contribute to that.

2. Choosing a ts level of 99.0 is actually pretty stringent; it means that you are looking to retrieve "only" 99% of true sites. Again, if you have a lot of high quality variants, that will lead to respectable snps like this one being unfairly filtered out.

So this is potentially a sign that your call set quality is pretty good and you can afford to lower your specificity requirement in favor of more sensitivity. I would recommend changing your ts level to 99.9%. That should rescue this snp while still giving you a good quality call set.

Oh, that does look like a nice clean snp. Two questions:

2. What call and emit thresholds did you use for calling?

• Member
1. Yes, I used 2.4-9 for the SNP calling. The BAM file pre-processing was done using version 2.2-3 (if that makes a difference).

2. I used the default settings, so 30.0 for both call and emit.

Ok, try dropping the emit threshold to 10 and call just that region again. The snp may get called as LowQual, but at least that'll give you the call statistics, which may shed some light on why the UG is not happy with that site.

• Member

Alright, I re-ran the SNP calling with lowered emit threshold. Even though I ran it on the entire chromosome, I still got a warning message about few variants from VariantRecalibrator (if that makes a difference). Anyway, this is what the SNP looks like in the VCF output after ApplyRecalibration:

- ------ rs---- T C 3465.20 VQSRTrancheSNP99.00to99.90 AC=4;AF=1.00;AN=4;DB;DP=110;Dels=0.00;FS=0.000;HaplotypeScore=1.0616;MLEAC=4;MLEAF=1.00;MQ=48.92;MQ0=0;QD=31.50;VQSLOD=-5.803e-01;culprit=MQ GT:AD:DP:GQ:PL 1/1:1,48:48:99:1479,135,0 1/1:0,61:61:99:2013,174,0

So the filter field says "VQSRTrancheSNP99.00to99.90", which means what exactly? The --ts_filter_level option was set to 99.0 so shouldn't this SNP pass the filter? If I understand correctly, this translates to: "The SNP didn't pass the filter because the truth sensitivity level was between 99 and 99.9". Well, I set the threshold to 99.0 so shouldn't this be ok?

The VCF header has the following information:
##FILTER=<ID=VQSRTrancheSNP99.00to99.90,Description="Truth sensitivity tranche level for SNP model at VQS Lod: -11.1528 <= x < 0.132">`

In the INFO field, the culprit is stated as MQ. However, the MQ is at 48.92 which is close to 99.999%, unless I'm mistaken in this case as well.

What is going on here?