If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VQSR on exome of small specific population

mayaabmayaab ✭✭IsraelMember ✭✭

I've asked this question at the workshop in Brussels, and I would like to post it here:
I'm working on an exome analysis on trio. I would like to run VQSR of filteration on the data. since this is an exome project, there are not a lot of varients, and therefore, as I understand. the VQSR is not accurate. You suggest to add more data from 1000Genomes or other published data.
The families that I'm working on belongs to a very small and specific population, and I'm afraid that adding published data will add a lot of noise.
What do you think, should I add more published data? change parameters such as maxGaussians? do hard filteration?




  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi Maya,

    I've thought about your case some more and I think you would still benefit from adding exomes from the 1kG dataset to your cohort. While this won't do anything to boost discovery of any rare variants that are unique to your population, it will allow you to use VQSR, which is always a good thing. It won't reduce sensitivity to the rare variants, and it won't increase the false positive discovery rate, if that's what you're worried about. Then once you have your recalibrated variant over the cohort, you can subset the variants of your samples, to get rid of all the stuff you're not interested in.

  • mayaabmayaab ✭✭ IsraelMember ✭✭

    Thanks. I can't understand how it won't reduce sensitivity to rare variants and won't increase false positives. can you explain it?
    and one important thing - I have very few samples: one of each parent and a child, meaning only three samples. which will make 1kG database be the majority of the data.

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    VQSR does not track how many samples in a cohort show evidence for the variant. It performs site-level, not sample-level filtering, using annotations that are not expected to be influenced by the frequency of variants in the population. What it does effectively is detect trends in the data that indicate artifactual calls, and filters those out. As long as the annotations of rare variants follow the same distribution as the rest (and there is no reason I can think of why they would not) they will be treated the same way. Variants do not need to be represented in the truth and training resources in order to be classified appropriately; if they were, it would defeat the purpose of the modeling.

    Does this help?

  • mayaabmayaab ✭✭ IsraelMember ✭✭

    OK, now I understand. thanks Geraldine

Sign In or Register to comment.