Is there a way to put a limit on the concurrency of a task during a scatter?

Let's say that I have a task processFoo that I want to scatter. E.g.,

scatter (foo in foos) {
   call processFoo { input: foo=foo}
}

Is there a way for me to say that I only want to run, let's say, ten instances of processFoo at a time?

I've been ending up with situations where Cromwell tries to use more processes than my Mac will allow me to run.

Thanks!

P.S. I know about the concurrent-job-limit setting that I can put in my application.conf, and I can work-around the issue using this setting. But this setting does something rather mysterious that doesn't give me the more fine-grained control that I would like.

Best Answers

Answers

  • concurrent-job-limit is the setting you need, just be sure to also set max-concurrent-workflows = 1, or cromwell will run several workflows at the same time.

    I use

    max-concurrent-workflows = 1
    concurrent-job-limit = 2
    

    and it works as expected. Setting concurrent-job-limit to a higher value just tends to io-limit the analysis, so I give each job 50% of the cpu cores in my PC using the various --threads options most tools have.

  • nessus42nessus42 Member
    edited May 2017

    Thanks for the feedback Redmar_van_den_Berg. The programs that I am using don't have "--threads" options, so I'm not sure that that solution will work well for me unfortunately.

    And ChirsL, thanks for the status report on concurrent-job-limit and advice about SGE and Google pipelines.

  • nessus42nessus42 Member

    P.S. Redmar, I just did a test and the SSD on my Mac will not be the bottleneck until I have more than 10 cores running on the job full steam.

  • nessus42nessus42 Member

    @ChrisL said:
    One thing I would advise you is that in the long run you'll be well served to take the time getting set up with either SGE (or a similar HTC) or Google's pipelines API, where these kinds of resource allocations are just handled for you and you can expand elastically to however much compute is available to you.

    Hi ChrisL, I'm trying to wade through the Cromwell documentation, but I can't find information on how you control how many nodes a scatter would try to run on, either for SGE or for JES.

    I haven't used either of these before, so if I had, perhaps these answers would be obvious to me from the documentation provided.

    Is there a tutorial somewhere else that shows you how to get your WDL job to run on SGE and/or JES and how to control how many nodes are used by a scatter?

    |>oug ([email protected])

  • ctorrojactorroja Member
    edited September 25

    Hi, I would like to add an example on which controlling concurrent jobs by task would be interesting, even though SGE backends will swallow everything and put it in the queue.

    Suppose in my workflow I have a task that queries a database or download something from the network. If I do not limit the concurrent tasks I may end up overloading the database engine or the network bandwidth.

    To me it would be great to be able to limit concurrency for some specific tasks in a workflow.

    Don't you think so?

  • mmahmmah Member, Broadie ✭✭

    @ctorroja Within the current WDL constraints, I suggest using a loop or thread pool alternative to scattering to handle this use case.

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    @ctorroja the good news is that concurrent-job-limit can now be applied to all backends (there's been progress since May last year!)

    Documentation is here: https://cromwell.readthedocs.io/en/cjl_hog_factor_doc/backends/Backends/#backend-job-limits

  • @ChrisL said:
    @ctorroja the good news is that concurrent-job-limit can now be applied to all backends (there's been progress since May last year!)

    Documentation is here: https://cromwell.readthedocs.io/en/cjl_hog_factor_doc/backends/Backends/#backend-job-limits

    Sorry Chris but the documentation link seems broken. Anyway, I set concurrent-job-limit equal to 1 but for my local workflow but it still processes all the scattered files at the same time. Am I missing something?

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    Hey @dario_romagnoli,

    This link should work: https://cromwell.readthedocs.io/en/stable/backends/Backends/#backend-job-limits

    Can you share your config and the version of Cromwell you're using? We use this limit quite often so I imagine something needs fixing in the configuration for it.

    Thanks!

  • dario_romagnolidario_romagnoli Member
    edited December 5

    Sure, I used both version 35 and 36. I attached my config file.
    Also thank you for the link

Sign In or Register to comment.