For a quick introduction to the GATK and the purpose of the Best Practices workflow, please see the intro talks from our latest workshop in the Presentations section.
The Best Practices workflow is divided in three main sections that are meant to be performed sequentially:
Data pre-processing: from raw sequence reads (FASTQ files) to analysis-ready reads (BAM files)
Variant discovery: from reads (BAM files) to variants (VCF files)
Suggested preliminary analyses
Important notes on context and caveats
This workflow was developed by the GATK development team over years of analysis work on many of the Broad Institute's sequencing projects. As a general rule, the command-line arguments and parameters given in the documentation examples are meant to be broadly applicable.
However, our testing focuses largely on data from human whole-genome or whole-exome samples sequenced with Illumina technology, so if you are working with different types of data or experimental designs, you may need to adapt certain branches of the workflow, as well as certain parameter selections and values. Unfortunately we are currently not able to provide official recommendations on how to deal with very divergent datatypes (such as Ion Torrent) or experimental designs (such as RNAseq).
In addition, the illustrations and tutorials provided in these pages tend to assume a simple experimental design where each sample is used to produce one DNA library that is sequenced separately on one lane of the machine.
Finally, please be aware that several key steps in the Best Practices workflow make use of existing resources such as known variants, which are readily available for humans (we provide several useful resource datasets for download from our FTP server). If no such resources are available for your organism, you may need to bootstrap your own or use alternative methods. We have documented useful methods to do this wherever possible, but be aware than some issues are currently still without a good solution.
Important note on GATK versions
The Best Practices have been updated for GATK version 2.7. If you are running an older version, you should seriously consider upgrading. For more details about what has changed in each version, please see the Version History section. If you cannot upgrade your version of GATK for any reason, please look up the corresponding version of the GuideBook PDF (also in the Version History section) to ensure that you are using the appropriate recommendations for your version.