If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
QD Distribution for sequence capture data
I am working with sequence capture data from a non model organism (based on a de novo genome). Our goal is to get the site frequency spectrum for use in demographic inference, so the number of singleton mutations is important to us.
I am using the recommended hard filters, and am losing 50,000 variants due to the QD < 2.0 filter. I wanted to get your advice to see if that filter is appropriate for sequence capture data, as I know the normal DP filters are not appropriate with capture data. My QD distribution looks very different from the QD distribution shown here.
When I use a straight QUAL < 30 filter, I get many more singletons in my SFS, some of which are probably false positives, but I am not sure what proportion. (Figure shows use of QUAL filter in pink, and QD outlined in blue).
Do you have any recommendations for adjusting the QD filter for use with sequence capture data, or QD distrubtions that look like mine?
Thanks so much for your help!
GATK version 3.7
Best Practices (though without VQSR since I am working with a de novo genome from a non-model organism and don't have good set of trusted SNPs)
Mean coverage: 25-35x