To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Output section in workflow causes "Unrecognized token" error

manidrmanidr Cambridge, MAMember

I have the following WDL that validates with wdltool (v 0.12, using gdac-firecloud) with no errors:
cna_analysis.wdl

task cna_analysis {
  File rna
  File cna
  File pome
  String prefix
  Int jidMax
  Int jid
  String codeDir = "/prot/proteomics/Projects/PGDAC/src"

  command {
    set -euo pipefail
    # setup directories and code
    cp ${codeDir}/cna-analysis.r ${codeDir}/generate-cna-plots.r .
    if [ ! -d ${prefix}-output ]; then 
      mkdir ${prefix}-output 
    fi
    # run cna analysis for corresponding shard / gather
    Rscript cna-analysis.r ${jid} ${jidMax} ${prefix} ${rna} ${cna} ${pome}
  }

  output {
    File rna_cna_corr = "${prefix}-output/mrna-vs-cna-corr${jid}.csv"
    File rna_cna_pval = "${prefix}-output/mrna-vs-cna-pval${jid}.csv"
    File pome_cna_corr = "${prefix}-output/pome-vs-cna-corr${jid}.csv"
    File pome_cna_pval = "${prefix}-output/pome-vs-cna-pval${jid}.csv"
  }

  runtime {
    docker : "broadcptac/pgdac_basic:1"
  }
}


task gather_results_and_plot {
  String prefix
  Int jidMax
  Array[File] rna_vs_cna_corr
  Array[File] rna_vs_cna_pval
  Array[File] pome_vs_cna_corr
  Array[File] pome_vs_cna_pval
  String codeDir = "/prot/proteomics/Projects/PGDAC/src"
  String dataDir = "/prot/proteomics/Projects/PGDAC/data"


  command {
    set -euo pipefail
    # setup directories and code
    cp ${codeDir}/cna-analysis.r ${codeDir}/generate-cna-plots.r .
    cp ${dataDir}/chr-length.csv ${dataDir}/gene-location.csv .
    if [ ! -d ${prefix}-output ]; then 
      mkdir ${prefix}-output 
    fi
    # copy results from scatter operation
    mv ${sep=" " rna_vs_cna_corr} ${prefix}-output
    mv ${sep=" " rna_vs_cna_pval} ${prefix}-output
    mv ${sep=" " pome_vs_cna_corr} ${prefix}-output
    mv ${sep=" " pome_vs_cna_pval} ${prefix}-output
    # run cna analysis for corresponding shard / gather
    Rscript cna-analysis.r 0 ${jidMax} ${prefix} NULL NULL NULL
  }

  output {
    Array[File] tables=glob ("${prefix}-*-vs-*.csv")
    File plot="${prefix}-cna-plot.png"
  }

  runtime {
    docker : "broadcptac/pgdac_basic:1"
  }
}



workflow run_cna_analysis {
  File rna
  File cna
  File pome
  String prefix
  Int jidMax
  File jidsFile
  Array[Int] jids = read_lines ("${jidsFile}")

  scatter (i in jids) {
    call cna_analysis {
      input:
        rna=rna,
        cna=cna,
        pome=pome,
        prefix=prefix,
        jidMax=jidMax,
        jid=i
    }
  }

  call gather_results_and_plot {
    input:
      prefix=prefix,
      jidMax=jidMax,
      rna_vs_cna_corr=cna_analysis.rna_cna_corr,
      rna_vs_cna_pval=cna_analysis.rna_cna_pval,
      pome_vs_cna_corr=cna_analysis.pome_cna_corr,
      pome_vs_cna_pval=cna_analysis.pome_cna_pval
  }

  output {
    Array[File] tables = gather_results_and_plot.tables
    File plot = gather_results_and_plot.plot
  }
}

When I run this locally using cromwell (v 28, using gdac-firecloud), I get the following error:

bash-3.2$ java -jar ../../bin/cromwell.jar run cna_analysis.wdl tests/inputs.json
Picked up _JAVA_OPTIONS: -Xmx4096m
[2017-09-27 07:52:04,87] [info] Slf4jLogger started
[2017-09-27 07:52:05,11] [info] RUN sub-command
[2017-09-27 07:52:05,11] [info]   WDL file: /Volumes/prot_proteomics/LabMembers/manidr/PGDAC/gdac-firecloud/workflows/cna_analysis/cna_analysis.wdl
[2017-09-27 07:52:05,11] [info]   Inputs: /Volumes/prot_proteomics/LabMembers/manidr/PGDAC/gdac-firecloud/workflows/cna_analysis/tests/inputs.json
[2017-09-27 07:52:05,92] [info] SingleWorkflowRunnerActor: Submitting workflow
[2017-09-27 07:52:11,29] [info] Running with database db.url = jdbc:hsqldb:mem:6b6ba033-2e9a-43cf-88c7-f47795378c7d;shutdown=false;hsqldb.tx=mvcc
[2017-09-27 07:53:01,74] [info] Metadata summary refreshing every 2 seconds.
[2017-09-27 07:53:01,93] [info] SingleWorkflowRunnerActor: Workflow submitted e8c4d20d-c743-41e4-a6a5-b747fd16ea8a
[2017-09-27 07:53:01,94] [info] Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a submitted.
[2017-09-27 07:53:01,99] [info] 1 new workflows fetched
[2017-09-27 07:53:01,99] [info] WorkflowManagerActor Starting workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a
[2017-09-27 07:53:01,99] [info] WorkflowManagerActor Successfully started WorkflowActor-e8c4d20d-c743-41e4-a6a5-b747fd16ea8a
[2017-09-27 07:53:01,99] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2017-09-27 07:53:02,05] [error] WorkflowManagerActor Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a failed (during MaterializingWorkflowDescriptorState): cromwell.engine.workflow.lifecycle.MaterializeWorkflowDescriptorActor$$anonfun$2$$anon$1: Workflow input processing failed.
Unable to load namespace from workflow: Unrecognized token on line 123, column 10:

    Array[File] tables = gather_results_and_plot.tables
         ^
[2017-09-27 07:53:02,05] [info] WorkflowManagerActor WorkflowActor-e8c4d20d-c743-41e4-a6a5-b747fd16ea8a is in a terminal state: WorkflowFailedState
[2017-09-27 07:53:04,47] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a transitioned to state Failed
[2017-09-27 07:53:04,47] [error] Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a transitioned to state Failed
java.lang.RuntimeException: Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a transitioned to state Failed
    at cromwell.engine.workflow.SingleWorkflowRunnerActor$RunnerData.addFailure(SingleWorkflowRunnerActor.scala:51)
    at cromwell.engine.workflow.SingleWorkflowRunnerActor$$anonfun$2.applyOrElse(SingleWorkflowRunnerActor.scala:128)
    at cromwell.engine.workflow.SingleWorkflowRunnerActor$$anonfun$2.applyOrElse(SingleWorkflowRunnerActor.scala:106)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at akka.actor.FSM$class.processEvent(FSM.scala:663)
    at cromwell.engine.workflow.SingleWorkflowRunnerActor.akka$actor$LoggingFSM$$super$processEvent(SingleWorkflowRunnerActor.scala:68)
    at akka.actor.LoggingFSM$class.processEvent(FSM.scala:799)
    at cromwell.engine.workflow.SingleWorkflowRunnerActor.processEvent(SingleWorkflowRunnerActor.scala:68)
    at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:657)
    at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:651)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
    at cromwell.server.CromwellRootActor.aroundReceive(CromwellRootActor.scala:26)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

I have double checked the syntax for the output section and can't find any error. Furthermore, when I remove the output section, the workflow runs to completion with no errors or problems.

I am trying to write a two-level scatter-gather workflow (using the method outlined in https://gatkforums.broadinstitute.org/wdl/discussion/8569/two-level-scatter-gather), and this WDL will be a sub-workflow -- hence the need to define outputs for use in the main workflow. Any help in resolving this is greatly appreciated.

Best Answer

Answers

  • manidrmanidr Cambridge, MAMember

    Hello @kshakir,

    Thanks very much for your response. It looks like the problem was caused by the cromwell version that I was using. I downloaded v29 and the problem went away.

    By the way, the complete cna_analysis.wdl was included in my message. The line number discrepency is because I deleted extraneous sections (like meta) and unnecessary empty lines to make the WDL as compact as possible.

Sign In or Register to comment.