What do I need to set up to write and execute WDL workflows?
Below is a list of the basic requirements / things you need to get in order to run workflows written in WDL (using the Cromwell execution engine, because that's what we use), with installation instructions where necessary. Because we use GATK in most of the tutorials and example WDL scripts on this website, we include a link to GATK installation instructions as well, but this is optional if you don’t plan to run the GATK WDLs.
There are additional resources that may help you work with WDL; see the Toolkit page for a full list.
WDLTool is a utility package that provides accessory functionality for writing and running WDL scripts, including syntax validation and input template generation. You can download the latest release of the pre-compiled executable here.
You will need a text editor of some sort to write your WDL scripts. It is important to note that there is a difference between a word processor (like Microsoft Word) and a text editor (like Notepad); please use the latter option. If you have no preferred text editor, we would recommend installing SublimeText, as we find that it displays code visually better than other text editors we've tried. As an added convenience when developing WDL scripted workflows, syntax highlighting has been developed for SublimeText, TextMate, vim, and IntelliJ. You can follow the links for installation instructions for your editor of choice.
Cromwell is an execution engine capable of running scripts written in WDL, describing data processing and analysis workflows involving command line tools (such as pipelines implementing the GATK Best Practices for Variant Discovery). If you are familiar with GATK, you may have heard of or even used an execution engine called Queue that was designed to run GATK workflows written as Qscripts. Together, Cromwell and WDL constitute a user-friendly alternative to Queue and Qscripts.
The installation of Cromwell itself is quite simple. The latest release can be downloaded here in the form of a pre-compiled jar. For ease of use, you can also add an environment variable to your terminal profile pointing at the Cromwell jar file.
Cromwell requires Java version 8, which you can find here.
Cromwell is capable of utilizing Docker images to assist in specifying environments when running workflows. If you’ve never worked with Docker before, this page may answer many of your questions. Docker is optional if you are simply working on your local machine (i.e. your computer rather than a remote server). If you are using a remote server, more often than not Docker is required. In our tutorials, we always tell you which optional installations will be required.
To use Docker, please install it according to your operating system, following the instructions given on the installation page.
Programs to be pipelined
Our tutorials feature tools from the GATK (GenomeAnalysisToolkit) and Picard to demonstrate how to write WDL scripts that perform real data processing and analysis tasks; in order to follow them you’ll need to install GATK, Picard, and its own dependencies. To that effect, you can find a complete walkthrough for installing these on the GATK website. The linked document provides instructions for installing several additional software packages that are useful for GATK-specific tutorials, but the only one that you really need to install for running WDL tutorials, beside GATK and Picard, is Java 1.7*. Installing the R library
gsalib (available on CRAN) is optional but highly recommended. When following along with a tutorial on this website, we will always tell you which optional installations will be required. Note that GATK and Cromwell currently require different versions of Java, so see this article for help dealing with that temporary problem.
*Note: As of version 3.6, GATK runs with Java version 1.8. You will not need Java 1.7 if you use GATK 3.6.