Can I retry job submissions that sometimes fail?

mmahmmah Member, Broadie

I am using Cromwell v26, running on SLURM. The SLURM system I am using currently suffers from an annoying bug where some job submissions fail due to a socket timeout. This error is transient, and retrying seems to always result in success.

In the backend configuration file, this looks like:

submit = 
    sbatch --wrap "/bin/bash ${script}"

The sbatch command to SLURM is timing out.

The latest blog entry seems to indicate that Cromwell will retry some operations.
https://software.broadinstitute.org/wdl/blog?id=9362

Is the job submission operation a command that can be configured to retry?

Post edited by mmah on
Tagged:

Issue · Github
by Geraldine_VdAuwera

Issue Number
1986
State
open
Last Updated
Assignee
Array
Milestone
Array

Best Answer

Answers

Sign In or Register to comment.