If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Feedback on approach to create a custom truth set for VQSR

sp580sp580 GermanyMember


I would like to ask you for feedback on my approach to construct a truth set, since there is no such resource for my species.

What I am doing is to:
1/ call variants with GATK best practices by joint calling with GenotypeGVCFs
2/ call variants with another caller (samtools mpileup-> bcftools call)
3/ Filter each set by retaining sites in which all samples have a depth of at least 10 (DP>=10) and a genotype quality of 30 (GQ>=30) in the FORMAT.
4/ Use retained sites common between both callers as truth set for VQSR

My reasoning was that sites called by two different algorithms having a GQ>=30 and DP>=10 in all samples of the cohort are very likely to be truth, and their annotations can be used to learn the rules of what a good variant looks like.

I would like to know if my reasoning makes sense to you and if so, what would you suggest me to change/add/remove (for example, I am not completely convinced about retaining sites if all samples have the min GQ and DP, what about if only one sample passes the condition?).

I greately appreciate your feedback and thanks in advance!


Sign In or Register to comment.