amaroamaro Broad InstituteMember, Broadie

Tacking on another thread to further highlight

Had a job stuck for 8 hours last night


  • mleventhalmleventhal Cambridge, MAMember, Broadie ✭✭

    I am experiencing the same issue.

    Workspace: ebert-fc/CVID
    SubmissionID: 484828df-fda9-4155-b818-11c5b946fe67
    WorkflowID: 0f6a1ea5-7c81-474a-b745-7c3d87d04321

  • breardonbreardon Cambridge, MAMember, Broadie

    Also observing this behavior

  • agraubertagraubert Member, Broadie
  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Beginning last night (5/17) around 10 pm EST, we noticed a large influx of jobs. Cromwell is currently running the maximum number of jobs we allow, which means there is a backlog waiting in the QueuedInCromwell state.

    I'm working to find out more information about how long this delay might be, but for now we ask for your patience.

  • amaroamaro Broad InstituteMember, Broadie

    What is the maximum number of jobs? Where does that max derive from?

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    The max number of concurrent jobs was based on what Cromwell could handle at once. We are now transitioning to a more flexible model that will calculate the maximum based on the size & complexity of jobs. We are also working to increase the maximum limit through a variety of means. As such, there isn't a hard number I can really give you at this point.

    The good news is that we have found one part that was creating a bottleneck in the queue. You may begin to see some of your workflows running. We are still working to find out more, and to increase the load we can handle.

  • amaroamaro Broad InstituteMember, Broadie

    why does there need to be a maximum number of jobs? shouldn't running in google cloud allow you to scale cromwell's resources up or down depending on the workload?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @amaro, that's the general idea in theory, but in practice even the Google Cloud has to set some limits. It seems that we're hitting some of those limits now due to an individual submission that amounts to a very large number of jobs (~60k). We're working with GCP support to figure out a way forward. See the service notice I just posted on the blog and banner in the FC portal for updates.

