Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

GATK VQSR tranches

Hi all - I'm stumped and need your help. I'm following the GATK best practices for calling variants with HaplotypeCaller in GVCF mode. One of my samples is NA12878, among 119 others samples in my cohort. For some reason GATK is missing a bunch of variants in this sample that I can clearly see in IGV but are not listed in the VCF. I discovered that the variant is being filtered out..reason being VQSRTranchesSNP99.00to99.90. The genotype is homozygous variant, DP is 243, Qual is 524742.54 and its known in dbSNP. I suspect this is happening to other variants.

How do I adjust VQSR or how tranches are used and variants get placed in? I supposed I need to fine tune my parameters...but I would think something as obvious as this variant would pass Filtering.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    There should be a "culprit" annotation in there flagging the annotation with the worst scores, which may be responsible for the variant getting filtered. That's supposed to help you figure out what's wrong.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited March 2015

    @golharam I noticed your interval being named VQSRTranchesSNP99.00to99.90. I just wanted to let you know, that the VQSR best practices for SNPs are:

    --ts_filter_level 99.5 \
    -mode SNP \
    

    According to the VR documentation the default --TStranche values are 100.0, 99.9, 99.0, and 90.0. Perhaps try to run VR again with higher granularity; i.e. by adding 99.5 to --TStranche?

    To save you a bit of time you can check the culprits in your .recal.gz file before you apply the recalibration.

    I hope you are able to avoid filtering out your true variant.

    Is this low coverage data by any chance? You might be better off using UnifiedGenotyper in this case?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh yeah, the culprits are in the recal file. And increased tranche granularity may help (you can specify as many as you want). @tommycarstensen to the rescue :)

Sign In or Register to comment.