Any way to set concurrent-job-limit dynamically?

We have a moderate number of GPU nodes available, which are sometimes heavily loaded, sometimes not. I'd like to be able to specify a limit on how many nodes to use when I submit a job. I'm able to do this by putting a hardcoded number into the runtime section of my tasks, and that works fine, but if I try to pass in a value, I get this error:

com.typesafe.config.ConfigException$WrongType: test/sge_options/sge_concurrency_test.conf: 36: concurrent-job-limit has type STRING rather than NUMBER

I've tried every combination of declarations of string versus number types I can think of, always with the same result. In particular, I can use either

concurrent-job-limit = 1
or
concurrent-job-limit = "1"

and everything works fine, but anything that substitutes any sort of variable value fails with the error shown above. E.g., this fails:

gpu_concurrency_limit = "1"
concurrent-job-limit = "${gpu_concurrency_limit}"

and it makes no difference whether the "1" is quoted or not. Any value of gpu_concurrency_limit passed in suffers the same fate.

Is this a Cromwell bug, or am I just missing something?

Best Answer

Answers

  • What I've done so far is to create a copy of the SGE backend that differs only in the concurrent job limit, which I've set to a low value. That works fine. I was hoping to be able to set that value at runtime, but it sounds like that's not possible. Changing the SGE submit parameters won't make any difference, because there are separate instances of the workflow running, with no communication between them. The WDL is a huge and hairy beast, so I'm reluctant to post it here. It sounds like I just need to resign myself to editing the config file and rerunning the server every time. Thanks.

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev ✭✭
    edited February 22

    This is certainly something that's been talked about. Don't hold your breath but something like a REST API endpoint that allows us to tune these scalability options in real time is certainly likely in the near/medium future. And of course, as always, if you get there first then PRs into Cromwell are always welcomed! :smile:

  • ThibThib CambridgeMember, Broadie, Dev

    @rgobbel Is what you would like a "per task concurrent job limit" ?
    I would also encourage you to file an issue on github with a description of your use case :)
    https://github.com/broadinstitute/cromwell

Sign In or Register to comment.