On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.

GATK on Intel BIGstack for on-premises infrastructure

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited January 9 in Pipelining Options

Broad-Intel Genomics Stack (BIGstack) is an end-to-end, optimized solution on Intel hardware for analyzing genomic data. It provides an efficient way to run pre-packaged, optimized workflows, including the GATK Best Practices workflows.

BIGstack’s software stack includes two components developed by Intel for efficient and scalable execution of genomics workflows: GenomicsDB and the Genomics Kernel Library (GKL). GenomicsDB is a data store for genomic variants. It is based on the TileDB array storage manager, a system for efficiently storing, querying, and accessing sparse and dense matrix/array data. GKL is a collection of common, compute-intensive kernels used in genomic analysis tools. Intel and The Broad Institute worked together to identify these kernels in GATK, and experts across Intel optimized the kernels for Intel architecture.

BIGstack also includes support to run other open-source libraries of genomic analysis tools: Picard, BWA, and Samtools. These tools perform a wide variety of tasks, from sorting and fixing tags to generating recalibration models. Users specify the files to be analyzed, what tools they want to use, and the order in which the execution engine (Cromwell) performs the tasks using Workflow Description Language (WDL) files.

For more information, check out www.intel.com/broadinstitute and www.intel.com/selectsolutions.

Post edited by Geraldine_VdAuwera on
Sign In or Register to comment.