To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Cromwell 28.2 Consistently Fails after Preempted Scatter

I have been noticing odd behaviour when running scatter jobs with Cromwell 28 on JES

When the final task (before the gather) is running within a scatter block and if it gets preempted, the next dependent step outside of the scatter block will start. At the same time the second attempt of the Preempted instance will start, and the entire run will fail.

In this scenario I am sorting bam files in a scatter operation and then subsequently Gathering the bam files in a merge step.

workflow wf {
     Array[File] input_bams
     scatter(bam in input_bams){
        call Sort {
          input:
            bam = bam
        }
     }
    call MergeBams {
      input:
         bams = sort.sorted_bam
   }
}

When the call to Sort is preempted, the outputs for that task are not resolved and an empty list is passed to the bams input field for the MergeBams. MergeBams then starts but fails to delocalize its files because there were no input files to begin with.

The Relevant excerpts from the Metadata are below:

"wf.MergeBamFiles": [{
      "preemptible": true,
      "retryableFailure": false,
      "executionStatus": "Failed",
      "shardIndex": -1,
      "jes": {
      },
      "runtimeAttributes": {
        "preemptible": "4",
        "failOnStderr": "false",
        "bootDiskSizeGb": "10",
        "disks": "local-disk 520 HDD",
        "continueOnReturnCode": "0",
        "cpu": "2",
        "noAddress": "false",
        "memory": "7.5 GB"
      },
      "inputs": {
        "input_bam": [],
        "disk_size": 250,
        "input_bam_index": [],
      },

      "returnCode": 1,

      "failures": [{
        "causedBy": [],
        "message": "Task wf.MergeBamFiles:1 failed. JES error code 5.  Message: 10: Failed to delocalize files: failed to copy the following files: ..."
      }],
      "backend": "JES",
      "end": "2017-09-04T18:15:04.438Z",
      "attempt": 1,
      "executionEvents": [{
        "startTime": "2017-09-04T18:12:06.940Z",
        "description": "RequestingExecutionToken",
        "endTime": "2017-09-04T18:12:06.941Z"
      }, {
        "startTime": "2017-09-04T18:12:06.940Z",
        "description": "Pending",
        "endTime": "2017-09-04T18:12:06.940Z"
      }, {
        "startTime": "2017-09-04T18:12:06.941Z",
        "description": "PreparingJob",
        "endTime": "2017-09-04T18:12:06.953Z"
      }, {
        "startTime": "2017-09-04T18:12:06.953Z",
        "description": "RunningJob",
        "endTime": "2017-09-04T18:15:04.190Z"
      }, {
        "startTime": "2017-09-04T18:15:04.190Z",
        "description": "UpdatingJobStore",
        "endTime": "2017-09-04T18:15:04.437Z"
      }],
      "start": "2017-09-04T18:12:06.940Z"
    }],
"wf.Sort": [{
      "preemptible": true,
      "retryableFailure": true,
      "executionStatus": "RetryableFailure",
      "backendStatus": "Failed",
      "shardIndex": 0,
      "inputs": {
        "input_bam": "....",
      },
      "failures": [{
        "causedBy": [],
        "message": "Task wf.Sort:0:1 failed. JES error code 10. Task 682b59c0-3fc1-4092-b57c-6e40ba9ef82e:Sort was preempted for the 1st time. The call will be restarted with another preemptible VM (max preemptible attempts number is 4). Error code 10. Message: 14: ...."
      }],
      "backend": "JES",
      "end": "2017-09-04T18:12:05.451Z",
      "attempt": 1,
      "executionEvents": [{
        "startTime": "2017-09-04T18:10:21.233Z",
        "description": "RunningJob",
        "endTime": "2017-09-04T18:12:05.344Z"
      }, {
        "startTime": "2017-09-04T18:10:20.811Z",
        "description": "Pending",
        "endTime": "2017-09-04T18:10:20.811Z"
      }, {
        "startTime": "2017-09-04T18:10:20.811Z",
        "description": "RequestingExecutionToken",
        "endTime": "2017-09-04T18:10:20.811Z"
      }, {
        "startTime": "2017-09-04T18:10:20.811Z",
        "description": "PreparingJob",
        "endTime": "2017-09-04T18:10:21.233Z"
      }, {
        "startTime": "2017-09-04T18:12:05.344Z",
        "description": "UpdatingJobStore",
        "endTime": "2017-09-04T18:12:05.451Z"
      }],
      "start": "2017-09-04T18:10:20.811Z"
    }, {
      "preemptible": true,
      "executionStatus": "Running",
      "backendStatus": "Running",
      "shardIndex": 0,
      "jes": {},
      "runtimeAttributes": {},

      "inputs": {
        "input_bam": "..."
      },
      attempt": 2,
      "start": "2017-09-04T18:12:05.920Z"
    }]
},

Best Answer

Answers

Sign In or Register to comment.