This section of the forum is no longer actively monitored. We are working on a support migration plan that we will share here shortly. Apologies for this inconvenience.
Best practices for large scatter operations
I'm using Cromwell 28.2 on Ubuntu 14.04, with Google JES as the backend, to run a scatter operation over a large array of files. When scattering over 500 files, Cromwell takes up about 5-6 GB of RAM. When scattering over ~3000 files, Cromwell takes up around ~30GB. This suggests to me that Cromwell's RAM usage will scale linearly with the number of instances.
I'd like to use Cromwell for much larger scatter operations - say 60,000 files. The above investigations suggest that this will take 600 GB of RAM.
- Are there recommendations for reducing Cromwell's RAM usage for very large jobs?
1a. If not, is splitting up a large job into smaller batches of jobs the only option?
- Is the above RAM usage inline with expectations?