GATK on Intel BIGstack for on-premises infrastructure
Broad-Intel Genomics Stack (BIGstack) is an end-to-end, optimized solution on Intel hardware for analyzing genomic data. It provides an efficient way to run pre-packaged, optimized workflows, including the GATK Best Practices workflows.
BIGstack’s software stack includes two components developed by Intel for efficient and scalable execution of genomics workflows: GenomicsDB and the Genomics Kernel Library (GKL). GenomicsDB is a data store for genomic variants. It is based on the TileDB array storage manager, a system for efficiently storing, querying, and accessing sparse and dense matrix/array data. GKL is a collection of common, compute-intensive kernels used in genomic analysis tools. Intel and The Broad Institute worked together to identify these kernels in GATK, and experts across Intel optimized the kernels for Intel architecture.
BIGstack also includes support to run other open-source libraries of genomic analysis tools: Picard, BWA, and Samtools. These tools perform a wide variety of tasks, from sorting and fixing tags to generating recalibration models. Users specify the files to be analyzed, what tools they want to use, and the order in which the execution engine (Cromwell) performs the tasks using Workflow Description Language (WDL) files.