message: insufficient data written

cbaocbao Member, Broadie ✭✭

Hi,
I got the following message. One of the task is Failed. But, some other tasks are still running. It is not the first time. How can I stop them and fix it? Thanks.

message: Unable to complete JES Api Request
causedBy:
message: insufficient data written

Best,
Chunyang

Tagged:

Best Answer

Answers

  • cbaocbao Member, Broadie ✭✭

    Just like this: (In this case, is the the Call #9 still running?)

    Call #7:
    Operation:operations/EMynwcfcKxjD_-Kbr-W6s5kBIMj7h5ORDioPcHJvZHVjdGlvblF1ZXVl
    Status:
    Failed
    Started:August 10, 2017, 4:24 AM (5 hours ago)
    Ended:August 10, 2017, 4:28 AM (5 hours ago)
    Inputs:Show
    Outputs:None
    stdout:Mutect2_Task-3-stdout.log
    stderr:Mutect2_Task-3-stderr.log
    JES log:Mutect2_Task-3.log
    Failures:Show
    Call #8:
    Operation:operations/ELykxcXcKxjnkZDbz8aNlmAgyPuHk5EOKg9wcm9kdWN0aW9uUXVldWU
    Status:
    RetryableFailure
    Cache Result:Miss
    Started:August 10, 2017, 4:23 AM (5 hours ago)
    Ended:August 10, 2017, 4:23 AM (5 hours ago)
    Inputs:Show
    Outputs:None
    stdout:Mutect2_Task-4-stdout.log
    stderr:Mutect2_Task-4-stderr.log
    JES log:Mutect2_Task-4.log
    Failures:Show
    Call #9:
    Operation:operations/ELDZz8fcKxjmjcinzMuvnrMBIMj7h5ORDioPcHJvZHVjdGlvblF1ZXVl
    Status:
    Running
    Started:August 10, 2017, 4:24 AM (5 hours ago)
    Ended:Pending...
    Inputs:Show
    Outputs:None
    stdout:Mutect2_Task-4-stdout.log
    stderr:Mutect2_Task-4-stderr.log
    JES log:Mutect2_Task-4.log

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Call 9 does appear to still be running, but I'm unsure on the error message you are receiving. Is the error message shown under Call 7?

  • ThibThib CambridgeMember, Broadie, Dev ✭✭

    Hi Chunyang,

    When a call fails the workflow fails immediately. Running jobs keep running but are not tracked by the system anymore so even though they will eventually finish they will still show up in "Running" state. We know this is not a great behavior and is on our list of things to improve.
    Does this error happen on the same call every time or does it change ?

    Thibault

  • cbaocbao Member, Broadie ✭✭

    Thank you for your kind reply.

    This error happened on different calls every time. It do chage.

  • cbaocbao Member, Broadie ✭✭

    @KateN said:
    Call 9 does appear to still be running, but I'm unsure on the error message you are receiving. Is the error message shown under Call 7?

    Yes. Call #7 is failed.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    One of our developers is investigating your issue right now; either they will update you or I will update you once we have something for you.

  • cbaocbao Member, Broadie ✭✭

    @KateN said:
    One of our developers is investigating your issue right now; either they will update you or I will update you once we have something for you.

    Thank you!

  • RuchiRuchi Member, Broadie, Dev admin
    edited August 2017

    Hey @cbao, what is the name of your workspace? Thanks!
    Edit: If you haven't done so already, would you mind sharing your workspace with [email protected]

  • cbaocbao Member, Broadie ✭✭

    @Ruchi said:
    Hey @cbao, what is the name of your workspace? Thanks!
    Edit: If you haven't done so already, would you mind sharing your workspace with [email protected]

    Thanks! Shared. Please check submission "8245f5b0-9799-454e-b050-e2b7b434fe96"

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Your submission ID is different from the workspace name. Ruchi will need the name of your Workspace in order to find it. It should be in the upper left corner of the page, above the Summary, Data, Analysis, etc. tabs when viewing your the workspace Summary page.

  • cbaocbao Member, Broadie ✭✭

    Sorry, here is the name of my workspace
    Workspace: nci-cbao-bi-org/BN10_WGS_Chunyang_Analysis

  • cbaocbao Member, Broadie ✭✭

    @KateN said:
    Thank you; I've looked into your workflow a bit, and I see the submission you are referring to.

    Some of our Cromwell developers have been looking into your issue and it seems the problem is due to size limitations. When you submit a job to FireCloud, Cromwell can batch your job with any other jobs that are ready to be created (from other users, other workflows, other billing projects). If the cumulative content of the batched requests is too large, it leads to batch creation failure and job failure, resulting in an insufficient_data_written error.

    Currently there are no good workarounds for this error. You can stand up your own instance of Cromwell to avoid having other jobs batched with yours. This is unfortunately difficult, but you can read the Cromwell readme to learn how to set up configuration options and such. Alternatively, you can keep trying to re-run your job; it seems you've been unlucky with being batched with other large jobs.

    It is possible that your own batch contribution could be too large. If you have too many or too long file names (for inputs or outputs), you can possibly run out of space even if you are batched with smaller jobs. If you think that applies to your workflow, try making the file names and/or variable names shorter. I don't think this is likely from looking at your workspace in particular.

    Our Cromwell developers are working on a fix for this problem now, which will be incorporated into Cromwell version 29.1. Unfortunately it will take a few days before this version is released and more time before this version is incorporated into FireCloud. We will be tracking progress of this fix here, but my naive estimation is a few weeks before this solution is implemented.

    Thank you so much! Hope the new verson will be released soon. Anyway, just one of my pairs is failed. I think can wait for several weeks.

Sign In or Register to comment.