Are there planned WDL improvements to separate tasks from execution?

mmahmmah Member, Broadie

For software development reasons (testing, code reuse, etc.) it is better to have small WDL tasks. Currently, however, small tasks are a problem on cluster and cloud backends because they incur extra overhead to start jobs and communicate between nodes. The Harvard Medical School clusters have automated systems built in to prevent running many (hundreds or more) small jobs that complete in under a minute because this incurs a large amount of overhead that is wasteful. So to run on a real system, small tasks are not always possible or desirable.

The difficulty as I see it is there is no way to separate definition of a WDL task from its execution. In an ideal language, I think I should be able to define a task for one set of inputs, then be able to specify separately how I want this to run for multiple sets without rewriting the task. It is easy to scatter jobs in WDL across many compute nodes, but it is not easy to run these serially on one node, or to scatter across a few nodes.

It would also help to be able to chain tasks to run on a single execution job. For a simple chain of jobs A, B, and C: A->B->C, each task runs on
a separate execution job. It would be nice to be able to aggregate all of these to run on a single execution job without rewriting the component tasks.

Best Answer

  • jgentryjgentry Member, Broadie, Dev
    Accepted Answer

    @mmah You've touched on a handful of good ideas which have been discussed in the past but as you note they don't currently exist.

    In the near term, if you haven't already come across concurrent-job-limit in the Cromwell config you can set that for your backend. It's a blunt instrument but the use case was similar.

    In general I don't think that WDL is the right place to define any of this although at the moment we don't have a great place where it would go, it turns out that runtime isn't a very good abstraction for a lot of reasons (e.g. if I use your WDL I might completely disagree and can't override it). What we really need long term is the ability to separate the actual WDL from specification on how to run things - not just the sort of thing you describe but also the CPU count, memory, etc. I think this is actually more or less what you're saying, just with me being pedantic by the use of WDL.

    In terms of declaring that you want multiple jobs to be executed on a single node, this comes up frequently for other reasons, e.g. avoiding the pain of localizing/delocalizing files which will be shared between jobs which require a similarly shaped node. As with everything being discussed here it's on our radar but isn't imminent.

    It'd also be good to be able to have more expressive power (again, not directly tied to the WDL IMO) in terms of the shape of how jobs are run.

    Very very long term our goal is to have a component of Cromwell that can be pretty smart about this sort of thing and handle most of it magically, but that's a hard problem to solve.

Answers

  • jgentryjgentry Member, Broadie, Dev
    Accepted Answer

    @mmah You've touched on a handful of good ideas which have been discussed in the past but as you note they don't currently exist.

    In the near term, if you haven't already come across concurrent-job-limit in the Cromwell config you can set that for your backend. It's a blunt instrument but the use case was similar.

    In general I don't think that WDL is the right place to define any of this although at the moment we don't have a great place where it would go, it turns out that runtime isn't a very good abstraction for a lot of reasons (e.g. if I use your WDL I might completely disagree and can't override it). What we really need long term is the ability to separate the actual WDL from specification on how to run things - not just the sort of thing you describe but also the CPU count, memory, etc. I think this is actually more or less what you're saying, just with me being pedantic by the use of WDL.

    In terms of declaring that you want multiple jobs to be executed on a single node, this comes up frequently for other reasons, e.g. avoiding the pain of localizing/delocalizing files which will be shared between jobs which require a similarly shaped node. As with everything being discussed here it's on our radar but isn't imminent.

    It'd also be good to be able to have more expressive power (again, not directly tied to the WDL IMO) in terms of the shape of how jobs are run.

    Very very long term our goal is to have a component of Cromwell that can be pretty smart about this sort of thing and handle most of it magically, but that's a hard problem to solve.

  • mmahmmah Member, Broadie

    @jgentry Sounds like you have a good handle on things. I just wanted to make sure these ideas are under consideration.

  • jgentryjgentry Member, Broadie, Dev

    @mmah They are and it's all good stuff. Just a matter of too many good ideas vs cycles to do them :)

Sign In or Register to comment.