If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VQSR on single exome

blueskypyblueskypy ✭✭Member ✭✭

hi, Geraldine,
Thanks for the webinar! You mentioned that VQSR isn't necessary for a single exome. But would there be any drawback to run it on a single exome? I see that it helps to set up the PASS filter.



  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Ah, glad to hear you caught the webinar.

    It's not that it's not necessary, it's that VQSR won't work properly on a single exome, because it won't have enough data. The random forests implementation will help with that. though it remains to be seen how small it will go.

  • blueskypyblueskypy ✭✭ Member ✭✭

    So do you mean running VQSR on a single exome would lower the calling quality? I did that for some single exomes, should I discard those and use the vcf files right after GenotypeGVCFs? Could you recommend the common criteria for hard filtering?

  • blueskypyblueskypy ✭✭ Member ✭✭

    hi, Geraldine, would you please help me with this question?

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Sure. The idea here is that typically, a single exome doesn't have enough variants to fully empower the model training. It'll run, but results may not be as good as they could be. Our recommendation for dealing with exomes, if you don't have a large cohort, is to include other exomes in your analysis. For example, you can get exomes from the 1000 Genomes project that match your samples (we try to match them up by ethnicity) to beef up your cohort, up to 30 samples. Or you can group whatever exomes you have in hand even if they're not part of the same project. It's better to do that than try to hard filter your variants.

    We're hoping that the new implementation (coming out in 3.2) will bypass this requirement to a large extent.

    Does this clarify things a little?

  • blueskypyblueskypy ✭✭ Member ✭✭
    edited April 2014

    hi, Geraldine,
    Sorry for responding late! I was occupied by a two-day meeting.
    Thanks for the clarification! I'd like to confirm that the VQSR only changes the variant quality scores but not the callings themselves, am I right? I mean, it won't change a variant A to T, or remove or add variants.

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    No worries -- that's correct, just keep in mind that the second step does change the FILTER field.

Sign In or Register to comment.