Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

Latest Release: 02/05/19
Release Notes can be found here.

google says job is done but FC/cromwell don't know it is done?

esalinasesalinas BroadMember, Broadie ✭✭✭

I have a task that is "Running" (in FireCloud), .....

ClipReadsWF.CreateSequenceGroupingTSVHide
Call #1:
ID:operations/EMTjxKegKxiRwdyOoJGP8Vwg_eDe0bUCKg9wcm9kdWN0aW9uUXVldWU
Status:
Running
Started:02/03/2017 at 12:06:39 PM (33 minutes ago)
Ended:Pending...
Inputs:Show
Outputs:None
stdout:CreateSequenceGroupingTSV-stdout.log
stderr:CreateSequenceGroupingTSV-stderr.log
JES log:CreateSequenceGroupingTSV.log

...but is done according to Google....

wm8b1-75c:ClipReadsWork esalinas$ gcloud alpha genomics  operations describe operations/EMTjxKegKxiRwdyOoJGP8Vwg_eDe0bUCKg9wcm9kdWN0aW9uUXVldWU|head -20
done: true
metadata:
  '@type': type.googleapis.com/google.genomics.v1.OperationMetadata
  clientId: ''
  createTime: '2017-02-03T17:06:40Z'
  endTime: '2017-02-03T17:08:06Z'
  events:
  - description: start
    startTime: '2017-02-03T17:07:22.289751081Z'
  - description: pulling-image
    startTime: '2017-02-03T17:07:24.776778310Z'
  - description: localizing-files
    startTime: '2017-02-03T17:07:58.787590044Z'
  - description: running-docker
    startTime: '2017-02-03T17:08:02.422473479Z'
  - description: delocalizing-files
    startTime: '2017-02-03T17:08:03.123440248Z'
  - description: copied 1 file(s) to "..."
    startTime: '2017-02-03T17:08:04.806092942Z'
  - description: ok

According to Google it finished over 30 min ago. The task is expected to take no more than 5-6 minutes. According to google the "Running-docker" was from 8:02.42 to 8:03.12 (less than one minute).

The rc file is in the bucket with return code 0.

wm8b1-75c:ClipReadsWork esalinas$ gsutil cat gs://fc-7119068c-40f9-4960-9dee-1d73730e9c1c/8a728caf-ac74-4cc3-a9cf-8b9923cea80c/ClipReadsWF/4d081440-5949-4b66-afc9-b4cbece4f823/call-CreateSequenceGroupingTSV/CreateSequenceGroupingTSV-rc.txt
0

@dlivitz suggested to use the network tab of the chrome developer tools and he saw some info related to call-caching so he suspect that there is something going on with it that is causing the workflow to stall.

      "cache": {
        "allowResultReuse": true
      },
      "Effective call caching mode": "CallCachingOff",

Answers

  • esalinasesalinas BroadMember, Broadie ✭✭✭

    I ran the whole WDL with local cromwell 24 and the workflow ended with status "Succeeded". It also has mention of dead letters encountered. Is that information helpful or relevant?

    [2017-02-03 13:00:55,12] [info] Message [cromwell.subworkflowstore.SubWorkflowStoreActor$SubWorkflowStoreCompleteSuccess] from Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/$b#-1185872290] to Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/WorkflowManagerActor/WorkflowActor-4d1eae5c-1171-4f49-aee4-feb1fea025bd#195713032] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
    [2017-02-03 13:01:37,50] [info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
    {
      "outputs": {
        "ClipReadsWF_GatherBamFiles_output_bam_index": "/Users/esalinas/REBC/ClipReadsWork/cromwell-executions/ClipReadsWF/4d1eae5c-1171-4f49-aee4-feb1fea025bd/call-GatherBamFiles/execution/merged.bai",
    ............
    ...............(removed)
    .................
      },
      "id": "4d1eae5c-1171-4f49-aee4-feb1fea025bd"
    }
    [2017-02-03 13:01:37,56] [info] WorkflowManagerActor: Received shutdown signal.
    [2017-02-03 13:01:37,56] [info] Message [cromwell.engine.workflow.WorkflowManagerActor$AbortAllWorkflowsCommand$] from Actor[akka://cromwell-system/deadLetters] to Actor[akka://cromwell-system/deadLetters] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
    
  • esalinasesalinas BroadMember, Broadie ✭✭✭
    edited February 2017

    Update:

    Though the task finished in google a few minutes after it was submitted, FC marked it as done over two hours later!
    I've seen such marks occurring up to around 5 minutes or so after the task completed, but never this long after

    ClipReadsWF.CreateSequenceGroupingTSVHide
    Call #1:
    ID:operations/EMTjxKegKxiRwdyOoJGP8Vwg_eDe0bUCKg9wcm9kdWN0aW9uUXVldWU
    Status:doneDone
    Started:02/03/2017 at 12:06:39 PM (3 hours ago)
    Ended:02/03/2017 at 2:28:59 PM (26 minutes ago)
    Inputs:Show
    Outputs:Show
    stdout:CreateSequenceGroupingTSV-stdout.log
    stderr:CreateSequenceGroupingTSV-stderr.log
    JES log:CreateSequenceGroupingTSV.log
    
  • esalinasesalinas BroadMember, Broadie ✭✭✭

    I want to point to another similar occurrence. I observe the job that is marked as "Running" in the UI

    FilterWorkflow.oncotatorHide
    Call #1:
    ID:operations/ENT185aiKxiJv8ryk6q35UIg_eDe0bUCKg9wcm9kdWN0aW9uUXVldWU
    Status:
    Running
    Started:02/09/2017 at 7:33:11 AM (2 hours ago)
    Ended:Pending...
    Inputs:Show
    Outputs:None
    stdout:oncotator-stdout.log
    stderr:oncotator-stderr.log
    JES log:oncotator.log
    
    

    However, according to gcloud alpha genomics operations describe, it has been done about 2 hours ago

    [email protected]:~$ gcloud alpha genomics operations describe operations/ENT185aiKxiJv8ryk6q35UIg_eDe0bUCKg9wcm9kdWN0aW9uUXVldWU|head
    done: true
    metadata:
      '@type': type.googleapis.com/google.genomics.v1.OperationMetadata
      clientId: ''
      createTime: '2017-02-09T12:33:11Z'
      endTime: '2017-02-09T12:35:17Z'
      events:
      - description: start
        startTime: '2017-02-09T12:33:45.516646052Z'
      - description: pulling-image
    

    I note in the UI at the top-right it says :

    Workflows: 0 Queued; 292 Active; 0 ahead of yours
    
    

    Given that I see "0 ahead of yours" waiting for this to say "Done" (instead of "Running") and the down-stream tasks to start would not seem expected. If there is a problem with my WDL, would "Failed" be expected instead of running? As previously observed, this task might soon be marked as "Done", but if I see "0 ahead of yours" why isn't it already marked as "Done" and the downstream task starting?

Sign In or Register to comment.