We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
Are there planned WDL improvements to separate tasks from execution?
For software development reasons (testing, code reuse, etc.) it is better to have small WDL tasks. Currently, however, small tasks are a problem on cluster and cloud backends because they incur extra overhead to start jobs and communicate between nodes. The Harvard Medical School clusters have automated systems built in to prevent running many (hundreds or more) small jobs that complete in under a minute because this incurs a large amount of overhead that is wasteful. So to run on a real system, small tasks are not always possible or desirable.
The difficulty as I see it is there is no way to separate definition of a WDL task from its execution. In an ideal language, I think I should be able to define a task for one set of inputs, then be able to specify separately how I want this to run for multiple sets without rewriting the task. It is easy to scatter jobs in WDL across many compute nodes, but it is not easy to run these serially on one node, or to scatter across a few nodes.
It would also help to be able to chain tasks to run on a single execution job. For a simple chain of jobs A, B, and C: A->B->C, each task runs on
a separate execution job. It would be nice to be able to aggregate all of these to run on a single execution job without rewriting the component tasks.