We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
Cromwell is slow to submit jobs in large scatter on JES

Hi GATK team,
I'm running a WDL script on cromwell 34 against a JES backend. The WDL is pretty simple - I am filtering a list of VCF files against a BED file of sites I want to keep. The WDL looks like this: https://gist.github.com/weinstockj/30e0d99d11e9a2633cf7602b74cbf5fe
Cromwell is very slow to submit jobs. I am using a very large input TSV of VCFs (39K files). By slow, I mean I have submitted jobs to a beefy cromwell VM, and it takes over an hour for any JES jobs to spin up. I was previously running this workflow on cromwell 31, where I experienced this issue as well. I'm running Cromwell in server mode, and after workflow submission, it displays very little CPU activity. When running this workflow with a small number of VCF files (100 VCFs), I do not experience the slow job submission. Is there a way that I can re-structure the WDL to avoid the the slow job submission (beyond splitting up things into smaller batches)?
Thanks,
Josh
Answers
Hey Josh,
It seems like Cromwell is probably very slowly checking for potential cache hits before submitting the job. A few questions:
1. Is call caching enabled?
2. If yes, can you retry the workflow with “read_from_cache” set to false in your workflow options?
3. Are all these input files living in GCS?
4. Would you be okay with sharing your WDL? I can check if something jumps out as an obviously expensive operation for Cromwell.
Thanks!
I was able to get Cromwell to submit jobs (much) faster after removing the call to the "size" command on line 45, which I gather is an expensive operation. My current takeaway is to avoid use of the "size" command in a scatter call.
Here is the relevant part of my configuration file:
```
filesystems {
local {
localization: [ "soft-link", "hard-link", "copy" ]
caching {
duplication-strategy: [ "soft-link", "hard-link", "copy" ]
hashing-strategy: "path+modtime"
check-sibling-md5: true
}
}
}
```