We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Density plots post hard-filtering, good enough for VQSR resource?

Hello!!
I am trying to hard filter a variant call set using, as a first approach, the following thresholds, according to a related publication I found: QD < 2.0 | FS > 60.0 | MQ < 40.0 | MQRankSum < -12.5 | ReadPosRankSum < -8.0
As I understand it, after going through the GATK documentation on hard filtering, there are no set rules to tell that the variant call set was (hard) filtered properly, other than visual inspection of the density plots, which requires experience.
For this reason, I would like to kindly require your inputs on the following plots, in order for me to see what are the next step I should undertake.
This is what I would venture:
QD: leave it as it is, enough variants filtered.
FS: according to log10 scale, leave it as it is, enough variants filtered.
MQ: may need higher stringency.
MQRankSum: according to log10 scale, leave it as it is, enough variants filtered.
ReadPosRankSum: according to log10 scale, leave it as it is, enough variants filtered.
Add SOR > 3.0?
I must mention that I want to use this variant call set as a resource for VQSR.
Thanks in advance!
Best Answer
-
bhanuGandham Cambridge MA admin
Hi @sp580
Here is a document that will help you in Understanding and adapting the generic hard-filtering recommendations.
Answers
Hi @sp580
Here is a doc with suggested hard filtering steps:https://software.broadinstitute.org/gatk/documentation/article.php?id=2806
If you are going to apply VQSR then you do not need to hard filter the variants.
PS: You need atleast 30 exomes or 1 whole genome to do VQSR
Hi @bhanuGandham
thanks for getting back to me so quickly.
I actually followed the steps in the document you pointed out.
Regarding the amount of data I am working with, I have 60 whole genomes (~20x).
Also, I forgot to mention I am working with mouse and I am following the recomendations for when working with non-human models found here https://software.broadinstitute.org/gatk/documentation/article.php?id=1259, where it is stated that:
In that sense, even though I will use VQSR, I should first define a costum resource by hard filtering the SNPs produced in an initial run. The plots shown in this post were produced from such data set.
I want to know whether applying these filters will produce an appropiate custom resource for VQSR.
Thanks in advance!
Hi @sp580
Here is a document that will help you in Understanding and adapting the generic hard-filtering recommendations.