For WDL questions, see the WDL specification and WDL docs.
For Cromwell questions, see the Cromwell docs and please post any issues on Github.
Best practices for large scatter operations
I'm using Cromwell 28.2 on Ubuntu 14.04, with Google JES as the backend, to run a scatter operation over a large array of files. When scattering over 500 files, Cromwell takes up about 5-6 GB of RAM. When scattering over ~3000 files, Cromwell takes up around ~30GB. This suggests to me that Cromwell's RAM usage will scale linearly with the number of instances.
I'd like to use Cromwell for much larger scatter operations - say 60,000 files. The above investigations suggest that this will take 600 GB of RAM.
- Are there recommendations for reducing Cromwell's RAM usage for very large jobs?
1a. If not, is splitting up a large job into smaller batches of jobs the only option?
- Is the above RAM usage inline with expectations?