We've moved!
You can find our new documentation site and support forum for posting questions here.

Job stuck in the running state

RuchiRuchi Member, Broadie, Moderator, Dev admin

My workflow (attached) ran and has a job stuck in the running state. The first task, bsmap ran and succeeded (the job outputs are in the appropriate google buckets) and status is properly updated/reflected in FireCloud/Cromwell. The next task samtools_sort has finished running successfully (the output has been delocalized and there's a proper return code) but it's status is still "Running". The last line in the logs for this workflow states 2017-07-19 17:05:23,472 cromwell-system-akka.dispatchers.backend-dispatcher-139 INFO - JesAsyncBackendJobExecutionActor [UUID(4f1968d0)methpipeindv.samtools_sort:NA:1]: Status change from - to Success but there's no status update.

Workflow logs:

2017-07-19 17:05:23,472 cromwell-system-akka.dispatchers.backend-dispatcher-139 INFO  - JesAsyncBackendJobExecutionActor [UUID(4f1968d0)methpipeindv.samtools_sort:NA:1]: Status change from - to Success
2017-07-19 16:55:35,151 cromwell-system-akka.dispatchers.backend-dispatcher-172 INFO  - JesAsyncBackendJobExecutionActor [UUID(4f1968d0)methpipeindv.samtools_sort:NA:1]: job id: operations/EILnmt7VKxjsyqf28KnJ2J0BILaftM_3CSoPcHJvZHVjdGlvblF1ZXVl
2017-07-19 16:45:11,316 cromwell-system-akka.dispatchers.backend-dispatcher-153 INFO  - JesAsyncBackendJobExecutionActor [UUID(4f1968d0)methpipeindv.bsmap:NA:1]: Status change from Running to Success
2017-07-19 15:51:41,874 cromwell-system-akka.dispatchers.backend-dispatcher-125 INFO  - JesAsyncBackendJobExecutionActor [UUID(4f1968d0)methpipeindv.bsmap:NA:1]: `bsmap -a /cromwell_root/fc-dceaadae-be69-41ab-a230-0b735c0556c1/sc-RRBS-zygote_01_R1.fastq.gz -b /cromwell_root/fc-dceaadae-be69-41ab-a230-0b735c0556c1/sc-RRBS-zygote_01_R2.fastq.gz -d /cromwell_root/fc-dceaadae-be69-41ab-a230-0b735c0556c1/Mus_musculus_assembly10.fasta -p 4 -v 0.05 -s 16 -r 0 -u -S 1 -R -o sc-RRBS-zygote_01_raw_bs.bam`
2017-07-19 15:51:41,873 cromwell-system-akka.dispatchers.backend-dispatcher-125 WARN  - JesAsyncBackendJobExecutionActor [UUID(4f1968d0)methpipeindv.bsmap:NA:1]: Unrecognized runtime attribute keys: defaultDisks
2017-07-19 15:51:41,142 cromwell-system-akka.dispatchers.engine-dispatcher-39 INFO  - WorkflowExecutionActor-4f1968d0-5060-4a85-aaec-a325d459d46e [UUID(4f1968d0)]: Starting calls: methpipeindv.bsmap:NA:1
 gs://cromwell-auth-broad-firecloud-methylation/4f1968d0-5060-4a85-aaec-a325d459d46e_auth.json
2017-07-19 15:51:38,610 cromwell-system-akka.dispatchers.backend-dispatcher-178 INFO  - JES [UUID(4f1968d0)]: Creating authentication file for workflow 4f1968d0-5060-4a85-aaec-a325d459d46e at
2017-07-19 15:51:38,574 cromwell-system-akka.dispatchers.backend-dispatcher-178 WARN  - JES [UUID(4f1968d0)]: Key/s [defaultDisks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2017-07-19 15:51:38,574 cromwell-system-akka.dispatchers.backend-dispatcher-178 WARN  - JES [UUID(4f1968d0)]: Key/s [defaultDisks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2017-07-19 15:51:38,574 cromwell-system-akka.dispatchers.backend-dispatcher-178 WARN  - JES [UUID(4f1968d0)]: Key/s [defaultDisks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2017-07-19 15:51:38,574 cromwell-system-akka.dispatchers.backend-dispatcher-178 WARN  - JES [UUID(4f1968d0)]: Key/s [defaultDisks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2017-07-19 15:51:38,574 cromwell-system-akka.dispatchers.backend-dispatcher-178 WARN  - JES [UUID(4f1968d0)]: Key/s [defaultDisks] is/are not supported by backend. Unsupported attributes will not be part of job executions.
2017-07-19 15:51:38,573 cromwell-system-akka.dispatchers.engine-dispatcher-63 INFO  - MaterializeWorkflowDescriptorActor [UUID(4f1968d0)]: Call-to-Backend assignments: methpipeindv.samtools_read_metrics -> JES, methpipeindv.MethylDackel -> JES, methpipeindv.bsmap -> JES, methpipeindv.samtools_sort -> JES, methpipeindv.MethylDackel_CHH -> JES, methpipeindv.create_rda -> JES, methpipeindv.bs_conversion_rate -> JES
2017-07-19 15:51:38,525 cromwell-system-akka.dispatchers.engine-dispatcher-90 INFO  - WorkflowManagerActor Successfully started WorkflowActor-4f1968d0-5060-4a85-aaec-a325d459d46e
2017-07-19 15:51:38,525 cromwell-system-akka.dispatchers.engine-dispatcher-90 INFO  - WorkflowManagerActor Starting workflow UUID(4f1968d0-5060-4a85-aaec-a325d459d46e)
2017-07-19 15:51:37,424 cromwell-system-akka.dispatchers.api-dispatcher-904 INFO  - Workflows 4f1968d0-5060-4a85-aaec-a325d459d46e submitted.

Best Answer

Answers

  • RuchiRuchi Member, Broadie, Moderator, Dev admin
  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    WDL:

    task bsmap {
      File fastq1
      File fastq2
      File ref_genome
      String sample
    
    
      command {
             bsmap -a ${fastq1} -b ${fastq2} -d ${ref_genome} -p 4 -v 0.05 -s 16 -r 0 -u -S 1 -R -o ${sample}_raw_bs.bam
      }
      runtime {
              docker: "adunford/methy:9"
              memory: "16 GB"
              defaultDisks: "local-disk 100 SSD"
      }
      output {
             File raw_bs_bam = "./${sample}_raw_bs.bam"
             File genome = "${ref_genome}"
             String sample_id = "${sample}"
      }
    }
    
    task samtools_sort {
      File raw_bs_bam
      String sample_id
      command {
              samtools sort ${raw_bs_bam} ${sample_id}_bs.sorted && samtools index ${sample_id}_bs.sorted.bam
      }
      runtime {
              docker: "adunford/methy:9"
              memory: "16 GB"
              defaultDisks: "local-disk 100 SSD"
      }
      output {
             File sorted_bs_bam   = "${sample_id}_bs.sorted.bam"
      }
    }
    
    task samtools_read_metrics{
         File sorted_bs_bam
         String sample_id
         command{
            echo ${sample_id} `samtools view ${sorted_bs_bam} | wc -l` `samtools view -F 4 ${sorted_bs_bam} | wc -l` > ${sample_id}.read_metrics.txt
         }
         runtime{
            docker: "adunford/methy:9"
            memory: "16 GB"
            defaultDisks: "local-disk 100 SSD"
         }
         output {
            File read_metrics = "${sample_id}.read_metrics.txt"
         }
    }
    
    task MethylDackel {
            File genome
            File sorted_bs_bam
            String sample_id
            command {
                    MethylDackel extract ${genome} ${sorted_bs_bam} -o ${sample_id}
                    grep -v '^track' ${sample_id}_CpG.bedGraph  > tmp
                    mv tmp ${sample_id}_CpG.bedGraph
            }
            runtime {
                  docker: "adunford/methy:9"
                  memory: "16 GB"
                  defaultDisks: "local-disk 100 SSD"
    
            }
            output {
                    File bed = "${sample_id}_CpG.bedGraph"
            }
    }
    
    task MethylDackel_CHH {
         File genome
         File sorted_bs_bam
         String sample_id
         command {
                 MethylDackel extract --CHH ${genome} ${sorted_bs_bam} -o ${sample_id}
         }
         runtime {
                 docker: "adunford/methy:9"
                 memory: "16 GB"
                 defaultDisks: "local-disk 100 SSD"
    
         }
         output {
                File chh_bed = "${sample_id}_CHH.bedGraph"
         }
    }
      String sample
    task bs_conversion_rate{
         File chh_bed
         String sample_id
         command {
                 sh /executable_files/collect_bsconv_metrics.sh ${sample_id} ${chh_bed} 
         }
         runtime {
                 docker: "adunford/methy:9"
         }
         output{
                 File bsconv = "${sample_id}_bsconv.txt"
         }
    }
    task create_rda {
         File bed
         String sample_id
         File bsconv
         File read_metrics
         command{
                 Rscript /Rscripts/create_rda_wrapper.R -f ${bed} -o ${sample_id}.rda -b ${bsconv} -r ${read_metrics}
         }
         runtime {
                 docker: "adunford/methy:9"
         }
         output {
                File rda = "${sample_id}.rda"
         }
    }
    workflow methpipeindv {
             File ref_genome
             File sample_id
             call bsmap     {input: sample = sample_id, ref_genome = ref_genome}
             call samtools_sort {input: raw_bs_bam = bsmap.raw_bs_bam, sample_id = sample_id }
             call samtools_read_metrics {input: sorted_bs_bam = samtools_sort.sorted_bs_bam, sample_id = sample_id}
             call MethylDackel {input: sorted_bs_bam = samtools_sort.sorted_bs_bam, sample_id = sample_id, genome = ref_genome}
             call MethylDackel_CHH  {input: sorted_bs_bam = samtools_sort.sorted_bs_bam, sample_id = sample_id, genome = ref_genome}
             call bs_conversion_rate {input: chh_bed = MethylDackel_CHH.chh_bed, sample_id = sample_id}
             call create_rda    {input: bed = MethylDackel.bed, sample_id = sample_id,bsconv = bs_conversion_rate.bsconv, read_metrics = samtools_read_metrics.read_metrics}
    }
    
  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin
    edited July 2017

    I think you may be seeing the same issue another user found here, which we believe is due to the bug with the Google slowdown: Long submission or completion delays due to Google Genomics Pipelines API bug. How long ago did you run this workflow? If it slowly/eventually turns to a Done state, then it would be this bug.

    Post edited by KateN on
  • aryeearyee Member, Broadie

    This particular workflow (above) has been stuck in the running state for 24 hours since the second task produced its outputs, without moving on to the next task. We have previously had it in this state for several days.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    That sounds different then; My understanding of the google bug is that it was when first spinning up the job or when closing it down, not between tasks. Could you share the workspace with [email protected], and tell us the name here? I will have a developer take a look.

  • aryeearyee Member, Broadie

    I have shared the workspace (broad-firecloud-methylation/Methylation_Pipeline) with [email protected] The workflow submission ID is eb96b3cd-d635-4e3e-b43e-7163aae5a9fb

  • aryeearyee Member, Broadie

    This workflow is still stuck in the submitted state in Firecloud (7 days later) but appears to have stopped. I was able to submit a dummy workflow ("sleep 3") and this did complete successfully.

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    I'm sorry to hear it is still stuck; we are working on the bug now, and I will update you when I have news.

  • RuchiRuchi Member, Broadie, Moderator, Dev admin

    @KateN It's been figured out that the reason this workflow was hanging is because dot-dirs aren't supported by google file paths and that isn't properly handled by Cromwell. I've filed an issue https://github.com/broadinstitute/cromwell/issues/2506 and meanwhile @aryee has adjusted his workflow to remove the dot-dir and I believe they've moved past this issue. Thanks!

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin
    Fantastic, glad to hear!
Sign In or Register to comment.