We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

VQSR and single sample processing

Hi guys,
we have a database-centric exome-SNP-calling pipeline here that gains new samples over time. Hence we so far called SNPs on single samples.
As far as I understand your docs, this does conflict with VQSR since it seems to be designed for multi-sample vcf files.

Is there any recommended practice for single sample files? Will the approach work reliably at all, or do we have to combine lets say subsets of our samples to get good results?

Thanks for your help!
Johannes

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Johannes,

    VQSR does indeed work better when run on calls from multiple samples, simply because having more data yields more accurate models. So combining subsets of data is generally recommended as a good way to empower the VQSR process. However, there are two big caveats to this. One is that the samples should be called together -- it is not enough to simply combine calls from separate calling runs. The second is that the samples you combine should be part of a coherent cohort. Ideally this is built upfront into the experimental design.

  • Hi,

    thanks for the answer. so to define which samples I can combine, in what sense should they be coherent? Tissue, sequencing machine, run, exome capture kit, library prep?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    In as many ways as possible. Sequencing technology and method on the one hand, genetics of the individuals (ethnic background, clinical focus if any) on the other. The idea is that the recalibrator attempts to identify patterns in the properties of variants, so you should avoid grouping samples that were treated (prepped, sequenced etc) differently (because then the error modes will be different and dilute the patterns) and avoid grouping individuals that do not have traits of interest in common. Does that make sense?

Sign In or Register to comment.