Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

VQSR and single sample processing

Hi guys,
we have a database-centric exome-SNP-calling pipeline here that gains new samples over time. Hence we so far called SNPs on single samples.
As far as I understand your docs, this does conflict with VQSR since it seems to be designed for multi-sample vcf files.

Is there any recommended practice for single sample files? Will the approach work reliably at all, or do we have to combine lets say subsets of our samples to get good results?

Thanks for your help!
Johannes

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi Johannes,

    VQSR does indeed work better when run on calls from multiple samples, simply because having more data yields more accurate models. So combining subsets of data is generally recommended as a good way to empower the VQSR process. However, there are two big caveats to this. One is that the samples should be called together -- it is not enough to simply combine calls from separate calling runs. The second is that the samples you combine should be part of a coherent cohort. Ideally this is built upfront into the experimental design.

  • johanneskoesterjohanneskoester Member

    Hi,

    thanks for the answer. so to define which samples I can combine, in what sense should they be coherent? Tissue, sequencing machine, run, exome capture kit, library prep?

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    In as many ways as possible. Sequencing technology and method on the one hand, genetics of the individuals (ethnic background, clinical focus if any) on the other. The idea is that the recalibrator attempts to identify patterns in the properties of variants, so you should avoid grouping samples that were treated (prepped, sequenced etc) differently (because then the error modes will be different and dilute the patterns) and avoid grouping individuals that do not have traits of interest in common. Does that make sense?

Sign In or Register to comment.