How much memory does Cromwell need for input or output files?

I am currently running Cromwell in a SLURM environment, where jobs fail if they exceed their requested memory limits. I am encountering workflow problems where my jobs are failing, and I am trying to understand why this is happening.

I have working:

  1. bcl2fastq on SLURM, with 2 cores, 16 GB memory
  2. Cromwell with simple worklfow on SLURM, with backend modified from LSF

I am now trying to combine these two working items, but my jobs are failing, at least partially due to insufficient memory and timeout errors. I do not understand when input or output is written to disk or is held in memory, or which process needs this memory (Cromwell process or task process). All I know at present is that when combining bcl2fastq with Cromwell there are failures somewhere, and at least one error message from a child process shows a SLURM out of memory error.

bcl2fastq takes as input a directory, which I am passing to the WDL task as a String. The child process stderr indicates the input files can be read normally.
The output of bcl2fastq is large when run without Cromwell: 10's of GB. With Cromwell, I see no data output to the execution directory.

How much memory does Cromwell need to handle a WDL task with bcl2fastq? Is this 1x output? 2x output? Does this memory need to be allocated to the SLURM process running Cromwell itself, or to the job running bcl2fastq, or to both? For a task consuming this output, how much memory is required beyond what is needed without Cromwell?

If I use a MySQL database instead of the in-memory database, how is this memory affected? Is task output data stored in the database, or is this only job metadata?

Although my example here is bcl2fastq, I also have other custom processing that ingests and outputs similarly large data sets, so I am generally looking for guidance on memory usage for Cromwell and its child jobs scaling with input and output data size.

Best Answer

Answers

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    So it looks like you know that you need 10GB of memory for your job. Are you specifying this using the runtime attribute in your WDL script? If you are, are you sure that the memory attribute is correctly being added to the SLURM command line? (Given that you are using this modified backend, your memory specification could be getting hung up here.) Have you tried testing the configuration with a smaller command--one that would normally use far less than 10 GB of memory?

    And unless you are using read_string, Cromwell won't ever read the input files into the memory itself, so you don't need to worry about that using up your compute.

  • mmahmmah Member, Broadie

    I have sorted out the issue (default output directory was not current directory) with data output in the execution directory, but this is still separate from the memory issue.

  • mmahmmah Member, Broadie

    I am pretty sure the memory attributes are being added to the SLURM command line correctly. I have checked this in the execution/script.submit file, and also discovered that some values for runtime variable name are not allowed: https://github.com/broadinstitute/cromwell/issues/2068. I have tested a simple WDL workflow with trivial python scripts that runs successfully for Cromwell on SLURM.

    I ask about how memory usages scales with input/output size because this statement in the Cromwell readme is ambiguous about what gets stored in the database.

    Cromwell uses either an in-memory or MySQL database to track the execution of workflows and store outputs of task invocations.

    If memory usage does not scale with input/output at all, then I am probably simply below the minimum I need somewhere.

Sign In or Register to comment.