Cromwell and WDL are now officially citable in literature!
They have a DOI and everything! Kate Voss presented this poster at the Bioinformatics Open-Source Conference in Prague last month. See the abstract below. The full citation is:
Voss K, Gentry J and Van der Auwera G. Full-stack genomics pipelining with GATK4 + WDL + Cromwell [version 1; not peer reviewed]. F1000Research 2017, 6(ISCB Comm J):1379 (poster) (doi: 10.7490/f1000research.1114631.1)
In case you're wondering, the flying pig with warp nacelles is Jamie, the mascot of the Cromwell execution engine. And the story behind that... is a story for another time
GATK4 is the new major version of the Genome Analysis Toolkit (GATK), one of the most widely used software toolkits for germline short variant discovery and genotyping in whole genome and exome data. For genomics analysts, this new version greatly expands the toolkit's scope of action within the variant discovery space and provides substantial performance improvements with the aim of shortening runtimes and reducing cost of analysis. But it also offers significant new advantages for developers, including a completely redesigned and streamlined engine that provides more flexibility, is easier to develop against, and supports new technologies such as Apache Spark and cloud platform functionalities (e.g. direct access to files in Google Cloud Storage).
WDL and Cromwell are a Workflow Definition Language and a workflow execution engine, respectively. The imperative that drives WDL’s development is to make authoring analysis workflows more accessible to analysts and biomedical scientists, while leaving as much as possible of any runtime complexity involved to the execution engine.
GATK4, WDL and Cromwell are all developed by the Data Sciences Platform (DSP) at the Broad Institute and released under a BSD 3-clause license. For more information on GATK’s recent licensing change, please see https://software.broadinstitute.org/gatk/blog?id=9645.
Taken together, these three components constitute a pipelining solution that is purposely integrated from the ground up, although they can each be used independently and in combination with other packages through features that maximize interoperability. This principle of integration applies equally to development, to deployment in production at the Broad, and to support provided to the external community.