Holiday Notice:
The Frontline Support team will be offline December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks as we get to all of your questions. Happy Holidays!

Multiple backends for Cromwell

I am using WDL to setup workflows and Cromwell for execution. I configured Cromwell to run on SGE and this is working nicely, however I wondered wether it is possible to specify multiple backends i.e. assign different backends to different tasks like assigning the number of cpus via. the runtime parameters?
For many check/validate tasks it would be much faster just to run locally and not submit to the queueing system.

Thanks for the great software!

Tagged:

Best Answer

  • RuchiRuchi admin
    Accepted Answer

    Hey @krdav,

    You should be able to choose a backend per task by using the runtime key backend in the same way you would declare cpu.
    For example:

    task a {
     ...
       command { ... }
       runtime {
          cpu = 3
          backend = "Local"
       }
    }
    

Answers

  • RuchiRuchi Member, Broadie, Moderator, Dev admin
    Accepted Answer

    Hey @krdav,

    You should be able to choose a backend per task by using the runtime key backend in the same way you would declare cpu.
    For example:

    task a {
     ...
       command { ... }
       runtime {
          cpu = 3
          backend = "Local"
       }
    }
    
  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    Hey @krdav -

    Out of interest, what kinds of checks are you trying to do? There's a chance we can do them directly in WDL!

    (Ideally you wouldn't have to resort to running an entire task just to do some simple checks so if we're able to incorporate new things into the base WDL language that'd be much neater long-term.)

    Just as a heads-up, at some point in the future we'll probably deprecate the runtime attribute backend in favor of allowing the submitter to decide which tasks to run where, but in the mean time your best bet is following @Ruchi's advice.

  • krdavkrdav Member, Broadie

    Thanks @Ruchi, this was exactly what I needed!
    I actually had thought about using the runtime stanza but didn't have the courage because of:
    https://gatkforums.broadinstitute.org/wdl/discussion/6704/runtime
    and:
    https://github.com/broadinstitute/cromwell#runtime-attributes
    Which does not mention this as a possibility.

    @ChrisL
    I would use backend = "Local" for lightweight sanity checks and everything that executes in a few minutes. For variant calling I follow the GATK guidelines pretty slavishly:
    https://github.com/broadinstitute/wdl/blob/develop/scripts/broad_pipelines/PublicPairedSingleSampleWf_170412.wdl
    And here the GetBwaVersion and CheckFinalVcfExtension tasks are two great examples of something I would push to local instead of submitting it as a job to the queue.
    I am not sure how the submitter should detect the appropriate backend and distinguish these lightweight tasks from something more demanding.
    Thanks for letting me know that there are plans on deprecating the attribute, then I might just drop using it for future compatibility.

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    @krdav thanks for stating your use case!

    I've added this issue to our issue tracker, since string contains checks seems like a really useful thing to be able to do natively in WDL.

    For version checks, what you're doing by running locally is checking that the Cromwell server has the right version installed... but when you send the task to be executed remotely, a totally different version may be being used.

    In my opinion, the best thing to do here is to specify that the task should run on a docker image with a well known version of bwa, so that the check isn't required in the first place.

  • krdavkrdav Member, Broadie

    Thanks for submitting the ticket.

    Hmm, I see your point regarding the versioning on login vs. compute nodes. For me this is trivial because I am using a shared filesystem so I guess I will just specify thebwa version in the input json.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @krdav,

    I'm a little concerned that your solution introduces some failure points. It sounds like you know what you're doing and will be fine, but I'd like to spell out my concerns explicitly for the benefit of anyone else reading this who may not have thought through the implications.

    If you rewire the task to pull down the same docker as used in the workflow and check the bwa version, the result will be valid — but obviously you can’t just run it on any random bwa binary you have lying around. Similarly, the workaround of just plugging in the bwa version as an input is reasonable in principle but places the onus on you to check manually what’s in the docker image. It seems like an unnecessary optimization to me, to be honest.

  • krdavkrdav Member, Broadie

    Hi @Geraldine_VdAuwera,

    Thanks for the clarification. You are right, it is important to distinguish between the different behaviours of different backends and/or use of docker. In my use case I am running everything on a backend similar to SGE (Torque/PBS to be exact) and therefore all nodes share the same filesystem. All the tools I use are specified in the input json and therefore tool versioning is an integrated part of the workflow.

Sign In or Register to comment.