Call QueuedInCromwell, Job Running?

jasongallant1jasongallant1 East Lansing, MIMember

Hi there-- I've been using FireCloud with great success, and just did a preliminary method run on about 275 files using a WDL that I wrote (https://github.com/msuefishlab/pkings_firecloud/blob/master/01_make_unaligned_bam_from_fastq.wdl). A few jobs ran out of disk space, so I upped the memory requested and ran again. A few of these did OK, but the rest did not. So I went ahead and upped the memory to something large (375GB) and submitted the remaining 75 jobs.

The submission showed a 'running' state but individual calls were 'stuck' at a QueuedInCromwell status. Reasoning that the extra disk drive may have been problematic, I reasoned that dialing this back would fix the issue. I aborted the submission, and restarted, requesting something more reasonable like 50GB per sample.

I see now the same issue. I can't see any obvious issues in the workflow logs (example included here), though under "workflow timing" , there might be a hint. I've attached a screen shot.

Currently a FC newbie, so not sure what to expect of Cromwell. From my experience with the previous jobs, using the same code and the same settings at 10GB and 20GB, execution was swift (about 20 min, total).

Is there a problem here, or does Cromwell get "busy" from time to time? I've left my 50GB job cooking to see if this issue resolves itself in some hours.


Best Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    Hi @jasongallant1, I'm glad to hear it cleared up. It could have been Cromwell being "busy" as you suspect -- as we scale up the system and take in more users we're finding a few areas where we need to make the system a bit more robust to usage spikes. Our engineers are actively working on it, hopefully we can iron out these bumps in the road so you can just get on with your work!

    Thanks for the nice tweet by the way :)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    Oh and I should add -- we don't yet have much documentation on the internals of the system but we're now looking to hire someone full-time to address that need.


