To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

General variant detection pipeline

Will_GilksWill_Gilks University of Sussex, UKMember

I'm a bit uncertain as to the optimal pipeline for calling variants. I've sequenced a population sample of ~200 at high coverage ~30X, with no prior information on nucleotide variation.

The most rigorous pipeline would seem to be:
1. Call variants with UG on 'raw' (realigned) bams.
2. Extract out high-confidence variants (high QUAL, high DP, not near indels or repeats, high MAF)
3. Perform BQSR using the high-confidence variants.
4. Call variants with HaplotypeCaller on recalibrated bams.
5. Perform VQSR using high-confidence variants.
6. Any other hard filters.

Is this excessive? Does using HaplotypeCaller negate the use of *QSR? Is it worthwhile performing VQSR if BQSR hasn't been done? Otherwise I'm just running HaplotyperCaller on un-recalibrated bams, and then hard-filtering.

Best Answer


Sign In or Register to comment.