On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.
Latest Release: 11/01/18
Release Notes can be found here.

Error when aborting workflow

gordon123gordon123 BroadMember, Broadie

I launched a single-task workflow, which ran with operations id: EIGzm7-SKxjd85qrosWuiVEg8J7JsoIWKg9wcm9kdWN0aW9uUXVldWU

The algorithm takes a long time to run (we need to fix that). I saw it was still running after 10 days, so I clicked the abort button. A couple days later, the workflow is still in the "Aborting" state. Eddie checked the state of JES using the operations id, and this showed that the job had been killed after 6 days due to taking too long to run, so the VM had already been taken down. I think a 6 day time limit is reasonable, but the status should be correctly reflected in Firecloud.

Issue · Github
by Geraldine_VdAuwera

Issue Number
1606
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • esalinasesalinas BroadMember, Broadie ✭✭✭

    Invoking "gcloud alpha genomics operations describe" showed the info:

    wm8b1-75c:~ esalinas$ gcloud alpha genomics operations describe EIGzm7-SKxjd85qrosWuiVEg8J7JsoIWKg9wcm9kdWN0aW9uUXVldWU
    done: true
    error:
    code: 1
    message: Operation canceled at 2016-12-28T11:04:05-08:00 because it is older than
    6 days
    metadata:
    '@type': type.googleapis.com/google.genomics.v1.OperationMetadata
    clientId: ''
    createTime: '2016-12-22T18:59:17Z'
    endTime: '2016-12-28T19:04:05Z'
    events:
    - description: start
    startTime: '2016-12-22T19:00:01.993545606Z'
    - description: pulling-image
    startTime: '2016-12-22T19:00:03.701259772Z'
    - description: localizing-files
    startTime: '2016-12-22T19:02:49.921198073Z'
    - description: running-docker
    startTime: '2016-12-22T20:25:30.078280997Z'
    labels: {}
    projectId: broad-firecloud-testing

  • gordon123gordon123 BroadMember, Broadie

    BTW - this is still in the aborting state. I suspect it will stay that way permanently unless someone tweaks something. Hopefully the VM is not being billed during this time.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @gordon123, it sounds like we have a spate of transient issues of this type; we're now keeping track of occurrences to evaluate impact. If you experience the same problem again, can you please post a comment in this thread?

    You make a good point about the VM hopefully not getting billed -- I'm going to check with the eng team whether this is something we need to worry about. Based on the JES info Eddie posted I assume not, but better safe than sorry.

  • xiaolicbsxiaolicbs Broad InstituteMember

    Hi @Geraldine_VdAuwera, I have run into similar issues as Gordon has mentioned. I have a job failed to abort for a month now. I didn't get billed for it. Just want to let you know this issue still exists.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @xiaolicbs We've fixed the cause of this problem so it shouldn't happen again, but that won't affect the erroneous job status. To be clear, nothing will update that status unless we take steps to clean the database, but I'm not aware of any plans to do so.

Sign In or Register to comment.