[WDL][Cromwell] Submitting a workflow with a subworkflow to the cloud.

Hi,

I am having an issue with how to appropriately submit a WDL workflow to the cloud when a subworkflow is involved, which I have not been able to find anything in the forums or spec sheets to explain this. I am attempting to run a variant calling workflow across a large cohort of whole exome data. The workflow is constructed as a scatter-gather of the individual samples. However, to parallelize the variant calling step with HaplotypeCaller (with intervals), I needed a nested scatter-gather. To do this, a subworkflow was invoked.

Currently, I am submitting the jobs with the following command:

gcloud alpha genomics pipelines run \
  --pipeline-file wdl_pipeline.yaml \
  --zones us-east1-b \
  --logging gs://dfci-testgenomes/logging \
  --inputs-from-file WDL=VariantCalling.cloud.wdl  \
  --inputs-from-file WORKFLOW_INPUTS=VariantCalling.cloud.inputs.json \
  --inputs-from-file WORKFLOW_OPTIONS=VariantCalling.cloud.options.json \
  --inputs WORKSPACE=gs://dfci-testgenomes/workspace \
  --inputs OUTPUTS=gs://dfci-testgenomes/outputs

The following files are located in the same directory as the invocation of the above command:
1)VariantCalling.cloud.wdl
2)VariantCalling.cloud.inputs.json
3)VariantCalling.cloud.options.json
4)subHaplotypeCaller.cloud.wdl

subHaplotypeCaller.cloud.wdl is my sub-workflow. In my main workflow (VariantCalling.cloud.wdl), it is imported and called as follows:

import "subHaplotypeCaller.cloud.wdl" as HaplotypeCaller
...
call HaplotypeCaller.HaplotypeCallerAndGatherVCFs {
    input:
        input_bam = ApplyBQSR.recalibrated_bam,
        input_bam_index = ApplyBQSR.recalibrated_bam_index,
        ref_fasta = ref_fasta,
        ref_fasta_index = ref_fasta_index,
        ref_dict = ref_dict,
        gvcf_basename = inputs[1],
        scattered_calling_intervals = scattered_calling_intervals
}

However, I get an error from Cromwell upon submission that reads as:

2017-03-06 22:30:52,742 cromwell-system-akka.actor.default-dispatcher-7 ERROR - WorkflowManagerActor: Workflow failed submission: Workflow input processing failed.
Unable to load namespace from workflow: /wdl_runner/subHaplotypeCaller.cloud.wdl
cromwell.engine.workflow.MaterializeWorkflowDescriptorActor$$anonfun$receive$1$$anon$1: Workflow input processing failed.
Unable to load namespace from workflow: /wdl_runner/subHaplotypeCaller.cloud.wdl
    at cromwell.engine.workflow.MaterializeWorkflowDescriptorActor$$anonfun$receive$1.applyOrElse(MaterializeWorkflowDescriptorActor.scala:69) ~[cromwell.jar:0.19]
    at akka.actor.Actor$class.aroundReceive(Actor.scala:467) ~[cromwell.jar:0.19]
    at cromwell.engine.workflow.MaterializeWorkflowDescriptorActor.aroundReceive(MaterializeWorkflowDescriptorActor.scala:59) ~[cromwell.jar:0.19]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [cromwell.jar:0.19]
    at akka.actor.ActorCell.invoke(ActorCell.scala:487) [cromwell.jar:0.19]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [cromwell.jar:0.19]
    at akka.dispatch.Mailbox.run(Mailbox.scala:220) [cromwell.jar:0.19]
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [cromwell.jar:0.19]

It seems the issue is at identifying where the sub-workflow is located. What would be the appropriate means to submit this workflow to gcloud with the sub-workflow. Please let me know if there is any further information I can provide.

-- Derrick DeConti

Tagged:

Best Answer

Answers

  • decontideconti DFCIMember

    Thanks, Geraldine.

    Hopefully that support will come about soon. Meanwhile, I think I can get around the issue by mapping the files to intervals, then scattering over a TSV with the files and the intervals.

Sign In or Register to comment.