Output section in workflow causes "Unrecognized token" error

manidrmanidr Cambridge, MAMember

I have the following WDL that validates with wdltool (v 0.12, using gdac-firecloud) with no errors:
cna_analysis.wdl

task cna_analysis {
  File rna
  File cna
  File pome
  String prefix
  Int jidMax
  Int jid
  String codeDir = "/prot/proteomics/Projects/PGDAC/src"

  command {
    set -euo pipefail
    # setup directories and code
    cp ${codeDir}/cna-analysis.r ${codeDir}/generate-cna-plots.r .
    if [ ! -d ${prefix}-output ]; then 
      mkdir ${prefix}-output 
    fi
    # run cna analysis for corresponding shard / gather
    Rscript cna-analysis.r ${jid} ${jidMax} ${prefix} ${rna} ${cna} ${pome}
  }

  output {
    File rna_cna_corr = "${prefix}-output/mrna-vs-cna-corr${jid}.csv"
    File rna_cna_pval = "${prefix}-output/mrna-vs-cna-pval${jid}.csv"
    File pome_cna_corr = "${prefix}-output/pome-vs-cna-corr${jid}.csv"
    File pome_cna_pval = "${prefix}-output/pome-vs-cna-pval${jid}.csv"
  }

  runtime {
    docker : "broadcptac/pgdac_basic:1"
  }
}


task gather_results_and_plot {
  String prefix
  Int jidMax
  Array[File] rna_vs_cna_corr
  Array[File] rna_vs_cna_pval
  Array[File] pome_vs_cna_corr
  Array[File] pome_vs_cna_pval
  String codeDir = "/prot/proteomics/Projects/PGDAC/src"
  String dataDir = "/prot/proteomics/Projects/PGDAC/data"


  command {
    set -euo pipefail
    # setup directories and code
    cp ${codeDir}/cna-analysis.r ${codeDir}/generate-cna-plots.r .
    cp ${dataDir}/chr-length.csv ${dataDir}/gene-location.csv .
    if [ ! -d ${prefix}-output ]; then 
      mkdir ${prefix}-output 
    fi
    # copy results from scatter operation
    mv ${sep=" " rna_vs_cna_corr} ${prefix}-output
    mv ${sep=" " rna_vs_cna_pval} ${prefix}-output
    mv ${sep=" " pome_vs_cna_corr} ${prefix}-output
    mv ${sep=" " pome_vs_cna_pval} ${prefix}-output
    # run cna analysis for corresponding shard / gather
    Rscript cna-analysis.r 0 ${jidMax} ${prefix} NULL NULL NULL
  }

  output {
    Array[File] tables=glob ("${prefix}-*-vs-*.csv")
    File plot="${prefix}-cna-plot.png"
  }

  runtime {
    docker : "broadcptac/pgdac_basic:1"
  }
}



workflow run_cna_analysis {
  File rna
  File cna
  File pome
  String prefix
  Int jidMax
  File jidsFile
  Array[Int] jids = read_lines ("${jidsFile}")

  scatter (i in jids) {
    call cna_analysis {
      input:
        rna=rna,
        cna=cna,
        pome=pome,
        prefix=prefix,
        jidMax=jidMax,
        jid=i
    }
  }

  call gather_results_and_plot {
    input:
      prefix=prefix,
      jidMax=jidMax,
      rna_vs_cna_corr=cna_analysis.rna_cna_corr,
      rna_vs_cna_pval=cna_analysis.rna_cna_pval,
      pome_vs_cna_corr=cna_analysis.pome_cna_corr,
      pome_vs_cna_pval=cna_analysis.pome_cna_pval
  }

  output {
    Array[File] tables = gather_results_and_plot.tables
    File plot = gather_results_and_plot.plot
  }
}

When I run this locally using cromwell (v 28, using gdac-firecloud), I get the following error:

bash-3.2$ java -jar ../../bin/cromwell.jar run cna_analysis.wdl tests/inputs.json
Picked up _JAVA_OPTIONS: -Xmx4096m
[2017-09-27 07:52:04,87] [info] Slf4jLogger started
[2017-09-27 07:52:05,11] [info] RUN sub-command
[2017-09-27 07:52:05,11] [info]   WDL file: /Volumes/prot_proteomics/LabMembers/manidr/PGDAC/gdac-firecloud/workflows/cna_analysis/cna_analysis.wdl
[2017-09-27 07:52:05,11] [info]   Inputs: /Volumes/prot_proteomics/LabMembers/manidr/PGDAC/gdac-firecloud/workflows/cna_analysis/tests/inputs.json
[2017-09-27 07:52:05,92] [info] SingleWorkflowRunnerActor: Submitting workflow
[2017-09-27 07:52:11,29] [info] Running with database db.url = jdbc:hsqldb:mem:6b6ba033-2e9a-43cf-88c7-f47795378c7d;shutdown=false;hsqldb.tx=mvcc
[2017-09-27 07:53:01,74] [info] Metadata summary refreshing every 2 seconds.
[2017-09-27 07:53:01,93] [info] SingleWorkflowRunnerActor: Workflow submitted e8c4d20d-c743-41e4-a6a5-b747fd16ea8a
[2017-09-27 07:53:01,94] [info] Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a submitted.
[2017-09-27 07:53:01,99] [info] 1 new workflows fetched
[2017-09-27 07:53:01,99] [info] WorkflowManagerActor Starting workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a
[2017-09-27 07:53:01,99] [info] WorkflowManagerActor Successfully started WorkflowActor-e8c4d20d-c743-41e4-a6a5-b747fd16ea8a
[2017-09-27 07:53:01,99] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2017-09-27 07:53:02,05] [error] WorkflowManagerActor Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a failed (during MaterializingWorkflowDescriptorState): cromwell.engine.workflow.lifecycle.MaterializeWorkflowDescriptorActor$$anonfun$2$$anon$1: Workflow input processing failed.
Unable to load namespace from workflow: Unrecognized token on line 123, column 10:

    Array[File] tables = gather_results_and_plot.tables
         ^
[2017-09-27 07:53:02,05] [info] WorkflowManagerActor WorkflowActor-e8c4d20d-c743-41e4-a6a5-b747fd16ea8a is in a terminal state: WorkflowFailedState
[2017-09-27 07:53:04,47] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a transitioned to state Failed
[2017-09-27 07:53:04,47] [error] Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a transitioned to state Failed
java.lang.RuntimeException: Workflow e8c4d20d-c743-41e4-a6a5-b747fd16ea8a transitioned to state Failed
    at cromwell.engine.workflow.SingleWorkflowRunnerActor$RunnerData.addFailure(SingleWorkflowRunnerActor.scala:51)
    at cromwell.engine.workflow.SingleWorkflowRunnerActor$$anonfun$2.applyOrElse(SingleWorkflowRunnerActor.scala:128)
    at cromwell.engine.workflow.SingleWorkflowRunnerActor$$anonfun$2.applyOrElse(SingleWorkflowRunnerActor.scala:106)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at akka.actor.FSM$class.processEvent(FSM.scala:663)
    at cromwell.engine.workflow.SingleWorkflowRunnerActor.akka$actor$LoggingFSM$$super$processEvent(SingleWorkflowRunnerActor.scala:68)
    at akka.actor.LoggingFSM$class.processEvent(FSM.scala:799)
    at cromwell.engine.workflow.SingleWorkflowRunnerActor.processEvent(SingleWorkflowRunnerActor.scala:68)
    at akka.actor.FSM$class.akka$actor$FSM$$processMsg(FSM.scala:657)
    at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:651)
    at akka.actor.Actor$class.aroundReceive(Actor.scala:484)
    at cromwell.server.CromwellRootActor.aroundReceive(CromwellRootActor.scala:26)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
    at akka.actor.ActorCell.invoke(ActorCell.scala:495)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
    at akka.dispatch.Mailbox.run(Mailbox.scala:224)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

I have double checked the syntax for the output section and can't find any error. Furthermore, when I remove the output section, the workflow runs to completion with no errors or problems.

I am trying to write a two-level scatter-gather workflow (using the method outlined in https://gatkforums.broadinstitute.org/wdl/discussion/8569/two-level-scatter-gather), and this WDL will be a sub-workflow -- hence the need to define outputs for use in the main workflow. Any help in resolving this is greatly appreciated.

Best Answer

Answers

  • manidrmanidr Cambridge, MAMember

    Hello @kshakir,

    Thanks very much for your response. It looks like the problem was caused by the cromwell version that I was using. I downloaded v29 and the problem went away.

    By the way, the complete cna_analysis.wdl was included in my message. The line number discrepency is because I deleted extraneous sections (like meta) and unnecessary empty lines to make the WDL as compact as possible.

Sign In or Register to comment.