Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Weird VQSR filtering pattern

Hi

I performed HC joint calling against 300 normal tissue samples. I did every step according to the best practice and got a VQSR plot very similar to this one. Most variants have the same MQ and FS value and make these feature distribution non-gaussian. Would such result be still valid? Can anyone share some insight?

Cheng

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @UniCorn

    Please post the version of GATK you are using, the exact command and the plots showing the weird VQSR filtering pattern.

  • UniCornUniCorn USMember

    The version is GATK4.0.8.1

    command line is :
    /biocluster/data/bioexec/software/gatk-4.0.8.1/gatk VariantRecalibrator \
    -R $REF \
    -V $input_vcf \
    -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 -tranche 99.6 -tranche 99.5 -tranche 99.4 \
    -tranche 99.3 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 \
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:$ref_dir/hapmap_3.3.hg19.sites.vcf \
    --resource omni,known=false,training=true,truth=false,prior=12.0:$ref_dir/1000G_omni2.5.hg19.sites.vcf \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:$dbsnp_vcf \
    -an MQRankSum -an ReadPosRankSum -an BaseQRankSum -an SOR -an MQ -an FS -an DQ \
    -mode SNP \
    -O $outdir/$output_prefix.snp.recal \
    --tranches-file $outdir/$output_prefix.snp.tranches \
    --rscript-file $outdir/$output_prefix.snp.plots.R

    I guess the reason is that MQ, FS and DQ does not follow Gaussian distribution (shows as very sharp peak with little standard deviation). Should I get ride of these non-Gaussian features? Is there a role of thrum regarding which features should be included? Thanks

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @UniCorn

    Can you also please post the VQSR plots that are showing the weird results.

Sign In or Register to comment.