Pipelining recommendations

edited January 9 in Pipelining Options

We use Cromwell + WDL for all batch execution purposes. WDL is a community-driven user-friendly scripting language managed by the OpenWDL organization. Cromwell is an open-source workflow execution engine that can connect through a variety of different platforms through pluggable backends, both local and cloud.

We take advantage of Cromwell's flexibility in our own work all the time: we do some of our development work (mostly small-scale) on the Broad's local cluster via SGE, and we run the rest of our development work (mostly large-scale) as well as the Broad's production pipelines on Google Cloud. This allows us to run exactly the same scripts regardless of the compute environment we choose (or need) to use at any point.

What's more, it allows us to make these workflows available publicly so that anyone in the research community can use them -- either in their own compute environment if they are using a platform that supports Cromwell, or through FireCloud, a cloud-based analysis portal developed by the same Data Sciences Platform group we in the GATK development team are a part of. All of our Best Practices workflows are available as versioned WDLs in Github under a dedicated organization called gatk-workflows.


Supported platform options

We care greatly about providing the research community with a range options for running our Best Practices workflows as implemented by the our development team. To that end, we are working with multiple industry partners to provide easy-to-use services for running our GATK Best Practices WDL scripts on both local computing infrastructures and public cloud platforms. For some background information on this effort, please see this 2016 announcement.

Post edited by Geraldine_VdAuwera on
Sign In or Register to comment.