We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
Does cromwell prioritise finishing workflows

Say I have a 2 task workflow
workflow A call B call C
If I start 10,000 A workflows quickly, will cromwell schedule 10,000 B calls before any C calls? (C calls start after B calls end, so I think this is the behaviour).
I think this is the behaviour and it is sensible and understandable. What techniques can I use to try to get C calls prioritised over B calls, so that the workflows start finishing sooner? Are there any built in cromwell ways to do this?
Ideas I had; have a different backend for B and C, with the C backend always prioritised over B. Or similarly have a backend parameter that does the same thing. This is a bit annoying as it starts to tie your 'portable' wdl file tightly to the backend implementation.
Thanks
Best Answer
-
I think you should use
scatter
over an array of inputs.
Something like thisworkflow A { Array[File] array_of_files scatter (file in array_of_files) { call B {input: file=file} call C {input: file=B.output} } }
Answers
I think you should use
scatter
over an array of inputs.Something like this
Agreed with @dario_romagnoli -- for scatter jobs that's exactly what should happen, you should start seeing results from task C if some task B jobs complete faster.
(deleted message: wrong discussion)
Hello dario,
In my example I have 10000 separate workflows. I think you are suggesting I combine them all into one mega workflow using a scatter. This might work but then any error in one workflow taints all other workflows.
Hello @Evan_Benn,
what do you mean with 10k worflows? Do you have 10k files to be analyzed separately?
I may be wrong but I think there can be only one workflow per wdl script, hence any parallelization must occur within one workflow.
What you are asking, if I understood your question, requires to start 10,000 different runs but that is something you would achive with bash.