For WDL questions, see the WDL specification and WDL docs.
For Cromwell questions, see the Cromwell docs and please post any issues on Github.
How much memory does Cromwell need for input or output files?
I am currently running Cromwell in a SLURM environment, where jobs fail if they exceed their requested memory limits. I am encountering workflow problems where my jobs are failing, and I am trying to understand why this is happening.
I have working:
- bcl2fastq on SLURM, with 2 cores, 16 GB memory
- Cromwell with simple worklfow on SLURM, with backend modified from LSF
I am now trying to combine these two working items, but my jobs are failing, at least partially due to insufficient memory and timeout errors. I do not understand when input or output is written to disk or is held in memory, or which process needs this memory (Cromwell process or task process). All I know at present is that when combining bcl2fastq with Cromwell there are failures somewhere, and at least one error message from a child process shows a SLURM out of memory error.
bcl2fastq takes as input a directory, which I am passing to the WDL task as a String. The child process stderr indicates the input files can be read normally.
The output of bcl2fastq is large when run without Cromwell: 10's of GB. With Cromwell, I see no data output to the execution directory.
How much memory does Cromwell need to handle a WDL task with bcl2fastq? Is this 1x output? 2x output? Does this memory need to be allocated to the SLURM process running Cromwell itself, or to the job running bcl2fastq, or to both? For a task consuming this output, how much memory is required beyond what is needed without Cromwell?
If I use a MySQL database instead of the in-memory database, how is this memory affected? Is task output data stored in the database, or is this only job metadata?
Although my example here is bcl2fastq, I also have other custom processing that ingests and outputs similarly large data sets, so I am generally looking for guidance on memory usage for Cromwell and its child jobs scaling with input and output data size.