This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
General variant detection pipeline
I'm a bit uncertain as to the optimal pipeline for calling variants. I've sequenced a population sample of ~200 at high coverage ~30X, with no prior information on nucleotide variation.
The most rigorous pipeline would seem to be:
1. Call variants with UG on 'raw' (realigned) bams.
2. Extract out high-confidence variants (high QUAL, high DP, not near indels or repeats, high MAF)
3. Perform BQSR using the high-confidence variants.
4. Call variants with HaplotypeCaller on recalibrated bams.
5. Perform VQSR using high-confidence variants.
6. Any other hard filters.
Is this excessive? Does using HaplotypeCaller negate the use of *QSR? Is it worthwhile performing VQSR if BQSR hasn't been done? Otherwise I'm just running HaplotyperCaller on un-recalibrated bams, and then hard-filtering.