Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

A way to come up with "truth set" to use VQSR

RP1RP1 Member

Dear GATK Team,

I have a question regarding finding cutoffs for hard filtering. I am working with yeast for which we do not have a good true variation set. I am following the best practices and have done the joint genotyping of my samples. To give some idea, my samples are yeast clones isolated from a population at different time points. I was wondering if I can select a subset of variants which are shared amongst more than 2 samples (and thus, more likely to be correct) to use as my "truth set", and thus, use VQSR pipeline instead. Am I doing something obviously wrong with this approach?

Thank you,
Ramya

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @RP1

    We do not work with yeast samples so unfortunately I will not be able to answer that question. This question might be better suited for the zoo and garden section of the forum that is community driven: https://gatkforums.broadinstitute.org/gatk/categories/zoo-garden. I am sorry I wasn't more helpful.

  • RP1RP1 Member

    Dear BhanuGandham,
    Thank you for the response. However, the identity of the species is irrelevant in my question except that I do not have a reliable "truth set" from an independent source.

    Let me paraphrase my question: Post JointGenotyping, if we are able to find a subset of variants which one is confident about having been correctly called then can that subset be used as a "truth set" to train the model for VQSR with the model subsequently being applied to the entire set. Please note in such a case, I will not have any "true negatives" but only "true positives" to train my model. I was wondering if I am making any obvious mistakes in this approach.

    Thank you,
    Ramya

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 24

    Hi @RP1

    Take a look at this doc: https://software.broadinstitute.org/gatk/documentation/article?id=11097

    PS: Checkout Terra for end-to-end GATK pipelining solutions and let us know what more pipelines we can add that will make using GATK easier for you! For more details on whether this is the right fit for you checkout our blog page.

    Post edited by bhanuGandham on
Sign In or Register to comment.