VQSR for multiple small target sequencing samples

We are developing analysis pipeline for our small target sequencing samples (target coding exons of 30 genes). We have 84 samples, sequenced from one lane of GAII. There are around 100 ~ 200 variants in one sample. From the best practice, the way to do it is to call variants on 84 samples together, then use VQSR on one single vcf to do soft filtering. I feel the number of variants (maybe still hundreds of variants) will be not enough for training the model. Instead of calling variants on 84 samples together, I called variants on each sample and then do VQSR on 84 vcf files, the VQSR was passed through and get better results compared with hard filtering. I just wonder whether my way is reasonable?

Thanks in advance for the comments

Best Answers


  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    No, this is not at all right. It goes against everything we recommend in the best practices.
    You should call all of your samples together, not separately, and run hard filtering instead of VQSR.

  • ying_sheng_1ying_sheng_1 Member ✭✭

    Thanks for quick answer. Yes, I am not following the best practices. However, this is a special case (or I think). All the samples are from non-related person. Each sample was sequenced very deep (over 1000x). The variants called on each sample should be reliable. The common variants among different samples shouldn't be treated as redundant variants. This can be a way to rich up number of variants for the small samples. And the diagnostic plot and results looks OK. It is better than results from hard filter. So, what's the plus of multi-sample variant calling? Or what's wrong consequence can coming out from this approach? Thanks a lot.

  • ying_sheng_1ying_sheng_1 Member ✭✭

    I am sorry I might not explain this clearly. When I ran VariantRecalibrator, I input multiple raw snp vcf files. So I am not calling variants on multiple samples, but when training Gaussian mix models, I used data from multiple samples. And in the last step of ApplyRecalibration, I applied model (recall, tranch files) on single sample. The only thing is although I chose 99% percent of sensitivity in the ApplyRecalibration, I can not get that for single sample. This might because of training and applying are on different sets. I am just not sure whether this way is reasonable.

  • ying_sheng_1ying_sheng_1 Member ✭✭

    Could I get some idea at:

    What are the bad things by using multiple VCF files (each VCF file is generated by calling variants from one BAM file) in variantRecalibrator, instead of using one VCF (called variants from multiple BAM files) in variantRecalibrator?


Sign In or Register to comment.