Best practices for large scatter operations
I'm using Cromwell 28.2 on Ubuntu 14.04, with Google JES as the backend, to run a scatter operation over a large array of files. When scattering over 500 files, Cromwell takes up about 5-6 GB of RAM. When scattering over ~3000 files, Cromwell takes up around ~30GB. This suggests to me that Cromwell's RAM usage will scale linearly with the number of instances.
I'd like to use Cromwell for much larger scatter operations - say 60,000 files. The above investigations suggest that this will take 600 GB of RAM.
- Are there recommendations for reducing Cromwell's RAM usage for very large jobs?
1a. If not, is splitting up a large job into smaller batches of jobs the only option?
- Is the above RAM usage inline with expectations?