It looks like you're new here. If you want to get involved, click one of these buttons!
This article is part of the workflow documentation describing the Best Practices for Variant Discovery in DNAseq data. See http://www.broadinstitute.org/gatk/guide/best-practices for the full workflow.
Once you've pre-processed your data according to our recommendations, you are ready to undertake the variant discovery process, i.e. identify the sites where your data displays variation relative to the reference genome, and calculate genotypes for each sample at that site. Unfortunately some of the variation you observe is caused by mapping and sequencing artifacts, so the greatest challenge here is to balance the need for sensitivity (to minimize false negatives, i.e. failing to identify real variants) vs. specificity (to minimize false positives, i.e. failing to reject artifacts). We have found that it is very difficult to reconcile these objectives in a single step, so instead we decompose the variant discovery process into separate steps: variant calling (performed per-sample), joint genotyping (performed per-cohort) and variant filtering (also performed per-cohort). The first two steps are designed to maximize sensitivity, while the filtering step aims to deliver a level of specificity that can be customized for each project.
Geraldine Van der Auwera, PhD