Error: Lock wait timeout exceeded; try restarting transaction

ChipChip 415M 4053Member, Broadie

This evening all FC operations (loading pair_set entities or launching jobs) in workspace rebc-oct16/rebc_template have resulted in a message:

Error: Lock wait timeout exceeded; try restarting transaction

Taking the message at face value, I tried restarting these "transactions" several times only to observe the same result.

Why is this?

Best Answer

Answers

  • ChipChip 415M 4053Member, Broadie

    FC is able to run jobs this morning. No clue what was wrong.

  • francois_afrancois_a Member, Broadie ✭✭

    I'm getting this error just now.

  • vickyhorstvickyhorst Member, Broadie

    I've been getting the same error since last night, and I still can't run any jobs. Any insights into the fix would be appreciated.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin
    edited March 2018

    Hi @vickyhorst @francois_a @Chip , this error comes up when FireCloud is overloaded with activity. For example, a bunch of submissions have all completed and are overloading the system trying to write their outputs back to the database.

    The best solution at the moment is to come back later (a couple of hours or the next day) and try again. This is not an ideal experience for you and I will make sure to communicate this to the team. Sorry for the inconvenience. We appreciate you letting us know and please continue to let us know if you hit this obstacle again.

  • RLCollinsRLCollins Harvard Medical SchoolMember ✭✭

    Hi @Tiffany_at_Broad, for what it's worth we've also been encountering this same issue over the course of the morning and early afternoon today.

    The load being reported in the queue status doesn't look too bad (2 queued, 1001 running, wait time < 1 min), but we've been encountering the same issue for hours.

    Any sense as to how long these deadlocking periods last? Or is their a lot of activity on FireCloud right now that isn't being reflected in the queue status?

    Looks like workflows that were running before this deadlocking episode are completing successfully, but we have had no success with launching new workflows today.

    Thanks!
    Ryan

  • francois_afrancois_a Member, Broadie ✭✭

    Thanks @Tiffany_at_Broad -- can you provide a rough estimate for how long this issue will persist? Waiting for hours/days for jobs to get submitted isn't really an option for much of the work we're doing.

  • nmcabilinmcabili BroadMember

    Hi everyone,
    We really apologize for the inconvenience. There is a team that is diagnosing the issue since last night. we are going to work in the coming quarter on a more permanent solution that should address this problem. Our team will reach you by email today to resolve the issues you are currently experiencing.
    Thanks for the understanding ,
    Moran

  • ChipChip 415M 4053Member, Broadie

    It's happening again Sunday night. Error: Lock wait timeout exceeded; try restarting transaction

  • francois_afrancois_a Member, Broadie ✭✭

    Seeing this again as well.

  • birgerbirger Member, Broadie, CGA-mod ✭✭✭

    This issue is really hindering CGA's ability to do its work. PLEASE give us more details regarding what is being done to address the issue and a time frame (more specific than "in the coming quarter") in which we can expect the problem to be fixed. This is a major regression in the scalability of the system.

  • danielrdanielr Broad InstituteMember, Broadie

    Hi Firecloud,

    I'm also having the same issue this morning. Any updates on this?

  • lelaginalelagina Member, Broadie

    Hi Firecloud team,

    Firecloud doesn't let me launch any job now. It gets stuck on a "Launching Analysis" and then returns back to the Launch Analysis window. There are no error messages.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    I wanted to reach out to give an update to this thread. Our developers are actively working on a fix right now. I will come back to give an update as soon as I have news on this front.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Last night, we were able to restart one of the processes and lift the deadlock symptoms many were experiencing yesterday. It isn't a permanent fix, and our developers are still working on a long-term solution. We are actively monitoring FireCloud for deadlock issues, outages, and slowness. However, if you do experience any of these, please do report them here on the forum as well. Thank you for your patience.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Last night, a release went out which includes a fix for the behavior we observed with the outage discussed in this thread. If you are still experiencing the same errors, or if you encounter them again in the future, please let us know.

    For further information on the fix included in the release, please read the release notes here.

  • RobinKRobinK Member

    Hi Firecloud,
    I am getting this message this afternoon for workflow id f6a63ea6-8477-4942-aed0-1dde1ce41676 among others. Any thoughts on what could be causing this even after the April 10th update?
    Thanks for your help.

  • francois_afrancois_a Member, Broadie ✭✭

    I'm getting this error again, in two variants:

    504: Request Timed Out
    and
    Lock wait timeout exceeded; try restarting transaction

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin
Sign In or Register to comment.