Latest Release: 04/24/19
Release Notes can be found here.

How to get the status of running jobs?

bhandsakerbhandsaker Member, Broadie, Moderator admin

Is there some way to get the status of running jobs? I submitted a workflow yesterday (just 4 tasks) which should probably take 4-6 hours to run at most. They have been running for 19 hours so far. But I can't figure out any way to tell if they are running or stuck or what. Is there any way to see partial output?

Workspace: nci-handsake-bi-org/GenomeSTRiP_Biobank_Eggan_MuscleAtrophy_30xWGS_SMAhPSCs
Workflow ID: 1e6e4d3c-3426-431b-a371-203ea785c82e

Best Answer

Answers

  • abaumannabaumann Broad DSDEMember, Broadie ✭✭✭

    I tried looking into this and these workflows don't have operation ids (so I don't think it's related to the delay described in the banner at the top of the page), and I'm not seeing any reason why they are delayed by looking at Cromwell logs (e.g. quota).

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    So, are you saying they never ran and my account was not charged?
    Should I kill them and resubmit?
    How can I tell if the job actually starts?

  • abaumannabaumann Broad DSDEMember, Broadie ✭✭✭

    Yes these did not start VMs, so there was no charge. This is an internal Google scaling issue that Google is looking at now - it's causing delays in workflows we submit actually starting. If you kill and resubmit I think you will get the same behavior.

    If the job actually starts, you will see an operation ID for that call and if you look in the timing diagram you should see that there is an initializing VM event.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Thanks, Alex.

    So, are we dead in the water in terms of doing any analyses in firecloud? What's the ETA for a fix?

    I'm trying to figure out if I need to pay for egress and process these files on-prem.

  • abaumannabaumann Broad DSDEMember, Broadie ✭✭✭

    Looks like those failed, can you try resubmitting? I'm trying to get more info on this from Google

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    I resubmitted and they all failed. Here is a sample log message:

    2017-06-19 19:37:46,515 INFO - MaterializeWorkflowDescriptorActor [UUID(409f8348)]: Call-to-Backend assignments: > genomestrip_preprocessing_workflow.genomestrip_preprocessing -> JES
    2017-06-19 19:37:46,583 INFO - JES [UUID(409f8348)]: Creating authentication file for workflow 409f8348-bae3-414c-903e-d5e664906821 at
    gs://cromwell-auth-nci-handsake-bi-org/409f8348-bae3-414c-903e-d5e664906821_auth.json
    2017-06-19 19:37:48,293 INFO - WorkflowExecutionActor-409f8348-bae3-414c-903e-d5e664906821 [UUID(409f8348)]: Starting calls: genomestrip_preprocessing_workflow.genomestrip_preprocessing:NA:1
    2017-06-19 19:37:48,660 INFO - JesAsyncBackendJobExecutionActor [UUID(409f8348)genomestrip_preprocessing_workflow.genomestrip_preprocessing:NA:1]: $SV_DIR/scripts/firecloud/gs_preprocess.sh RP-1421_PGD1 /cromwell_root/fc-a66d5880-af6e-4feb-8059-a1b36bdadddb/Biobank_Eggan_MuscleAtrophy_30xWGS_SMAhPSCs/RP-1421.PGD1.cram /cromwell_root/broad-gs-references/Homo_sapiens_assembly38_12Oct2016.tar.gz
    2017-06-19 19:40:12,701 ERROR - JesAsyncBackendJobExecutionActor [UUID(409f8348)genomestrip_preprocessing_workflow.genomestrip_preprocessing:NA:1]: Error attempting to Execute
    cromwell.backend.impl.jes.statuspolling.JesApiQueryManager$JesApiException: Unable to complete JES Api Request
    at cromwell.backend.impl.jes.statuspolling.RunCreation$$anon$1.onFailure(RunCreation.scala:25)
    at com.google.api.client.googleapis.batch.json.JsonBatchCallback.onFailure(JsonBatchCallback.java:54)
    at com.google.api.client.googleapis.batch.json.JsonBatchCallback.onFailure(JsonBatchCallback.java:50)
    at com.google.api.client.googleapis.batch.BatchUnparsedResponse.parseAndCallback(BatchUnparsedResponse.java:223)
    at com.google.api.client.googleapis.batch.BatchUnparsedResponse.parseNextResponse(BatchUnparsedResponse.java:155)
    at com.google.api.client.googleapis.batch.BatchRequest.execute(BatchRequest.java:253)
    at cromwell.backend.impl.jes.statuspolling.JesPollingActor.runBatch(JesPollingActor.scala:63)
    at cromwell.backend.impl.jes.statuspolling.JesPollingActor.cromwell$backend$impl$jes$statuspolling$JesPollingActor$$handleBatch(JesPollingActor.scala:57)
    at cromwell.backend.impl.jes.statuspolling.JesPollingActor$$anonfun$receive$1.applyOrElse(JesPollingActor.scala:36)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:496)
    at cromwell.backend.impl.jes.statuspolling.JesPollingActor.aroundReceive(JesPollingActor.scala:22)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: cromwell.backend.impl.jes.statuspolling.JesApiQueryManager$GoogleJsonException: the local copy message must have path set.
    ... 20 common frames omitted
    2017-06-19 19:40:15,709 INFO - $i [UUID(409f8348)]: Copying workflow logs from /cromwell-workflow-logs/workflow.409f8348-bae3-414c-903e-d5e664906821.log to gs://fc-c5e492a9-c6fa-42e7-84be-26a56e0691d9/933c8724-6726-42a2-8c7c-808d713bf3a3/workflow.logs/workflow.409f8348-bae3-414c-903e-d5e664906821.log

  • abaumannabaumann Broad DSDEMember, Broadie ✭✭✭

    Are you OK if I share this WDL with our team and Google folks to help debug?

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    sure, no problem

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    OK. Hopefully you will add something so that problems like this are easier for the end user to diagnose.
    I actually set the input to an empty string on purpose (because in the data we got delivered from the platform, they didn't deliver index files - there's a separate ticket about that). But I believe we don't need the index files for preprocessing.

Sign In or Register to comment.