Tacking on another thread to further highlight
Had a job stuck for 8 hours last night
I am experiencing the same issue.
Also observing this behavior
Beginning last night (5/17) around 10 pm EST, we noticed a large influx of jobs. Cromwell is currently running the maximum number of jobs we allow, which means there is a backlog waiting in the QueuedInCromwell state.
I'm working to find out more information about how long this delay might be, but for now we ask for your patience.
What is the maximum number of jobs? Where does that max derive from?
The max number of concurrent jobs was based on what Cromwell could handle at once. We are now transitioning to a more flexible model that will calculate the maximum based on the size & complexity of jobs. We are also working to increase the maximum limit through a variety of means. As such, there isn't a hard number I can really give you at this point.
The good news is that we have found one part that was creating a bottleneck in the queue. You may begin to see some of your workflows running. We are still working to find out more, and to increase the load we can handle.
why does there need to be a maximum number of jobs? shouldn't running in google cloud allow you to scale cromwell's resources up or down depending on the workload?
Hi @amaro, that's the general idea in theory, but in practice even the Google Cloud has to set some limits. It seems that we're hitting some of those limits now due to an individual submission that amounts to a very large number of jobs (~60k). We're working with GCP support to figure out a way forward. See the service notice I just posted on the blog and banner in the FC portal for updates.