Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Can I retry job submissions that sometimes fail?

mmahmmah Member, Broadie ✭✭

I am using Cromwell v26, running on SLURM. The SLURM system I am using currently suffers from an annoying bug where some job submissions fail due to a socket timeout. This error is transient, and retrying seems to always result in success.

In the backend configuration file, this looks like:

submit = 
    sbatch --wrap "/bin/bash ${script}"

The sbatch command to SLURM is timing out.

The latest blog entry seems to indicate that Cromwell will retry some operations.
https://software.broadinstitute.org/wdl/blog?id=9362

Is the job submission operation a command that can be configured to retry?

Post edited by mmah on
Tagged:

Issue · Github
by Geraldine_VdAuwera

Issue Number
1986
State
open
Last Updated
Assignee
Array
Milestone
Array

Best Answer

Answers

Sign In or Register to comment.