Best Practices for Variant Discovery in DNAseq

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,822Administrator, GATK Developer admin

This article is part of the Best Practices documentation. See for the full documentation set.

This is our recommended workflow for calling variants in DNAseq data from cohorts of samples, in which steps from data processing up to variant calling are performed per-sample, and subsequent steps are performed jointly on all the individuals in the cohort.


The workflow is divided in three main sections that are meant to be performed sequentially:

  • Data cleanup: from raw DNAseq sequence reads (FASTQ files) to analysis-ready reads (BAM files)
  • Variant discovery: from reads (BAM files) to variants (VCF files)
  • Evaluation: callset QC, refinement and preliminary analyses

Important note on GATK versions

The Best Practices have been updated for GATK version 3. If you are running an older version, you should seriously consider upgrading. For more details about what has changed in each version, please see the Version History section. If you cannot upgrade your version of GATK for any reason, please look up the corresponding version of the GuideBook PDF (also in the Version History section) to ensure that you are using the appropriate recommendations for your version.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD


Sign In or Register to comment.