To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

[WDL][Cromwell] Submitting a workflow with a subworkflow to the cloud.

Hi,

I am having an issue with how to appropriately submit a WDL workflow to the cloud when a subworkflow is involved, which I have not been able to find anything in the forums or spec sheets to explain this. I am attempting to run a variant calling workflow across a large cohort of whole exome data. The workflow is constructed as a scatter-gather of the individual samples. However, to parallelize the variant calling step with HaplotypeCaller (with intervals), I needed a nested scatter-gather. To do this, a subworkflow was invoked.

Currently, I am submitting the jobs with the following command:

gcloud alpha genomics pipelines run \
  --pipeline-file wdl_pipeline.yaml \
  --zones us-east1-b \
  --logging gs://dfci-testgenomes/logging \
  --inputs-from-file WDL=VariantCalling.cloud.wdl  \
  --inputs-from-file WORKFLOW_INPUTS=VariantCalling.cloud.inputs.json \
  --inputs-from-file WORKFLOW_OPTIONS=VariantCalling.cloud.options.json \
  --inputs WORKSPACE=gs://dfci-testgenomes/workspace \
  --inputs OUTPUTS=gs://dfci-testgenomes/outputs

The following files are located in the same directory as the invocation of the above command:
1)VariantCalling.cloud.wdl
2)VariantCalling.cloud.inputs.json
3)VariantCalling.cloud.options.json
4)subHaplotypeCaller.cloud.wdl

subHaplotypeCaller.cloud.wdl is my sub-workflow. In my main workflow (VariantCalling.cloud.wdl), it is imported and called as follows:

import "subHaplotypeCaller.cloud.wdl" as HaplotypeCaller
...
call HaplotypeCaller.HaplotypeCallerAndGatherVCFs {
    input:
        input_bam = ApplyBQSR.recalibrated_bam,
        input_bam_index = ApplyBQSR.recalibrated_bam_index,
        ref_fasta = ref_fasta,
        ref_fasta_index = ref_fasta_index,
        ref_dict = ref_dict,
        gvcf_basename = inputs[1],
        scattered_calling_intervals = scattered_calling_intervals
}

However, I get an error from Cromwell upon submission that reads as:

2017-03-06 22:30:52,742 cromwell-system-akka.actor.default-dispatcher-7 ERROR - WorkflowManagerActor: Workflow failed submission: Workflow input processing failed.
Unable to load namespace from workflow: /wdl_runner/subHaplotypeCaller.cloud.wdl
cromwell.engine.workflow.MaterializeWorkflowDescriptorActor$$anonfun$receive$1$$anon$1: Workflow input processing failed.
Unable to load namespace from workflow: /wdl_runner/subHaplotypeCaller.cloud.wdl
    at cromwell.engine.workflow.MaterializeWorkflowDescriptorActor$$anonfun$receive$1.applyOrElse(MaterializeWorkflowDescriptorActor.scala:69) ~[cromwell.jar:0.19]
    at akka.actor.Actor$class.aroundReceive(Actor.scala:467) ~[cromwell.jar:0.19]
    at cromwell.engine.workflow.MaterializeWorkflowDescriptorActor.aroundReceive(MaterializeWorkflowDescriptorActor.scala:59) ~[cromwell.jar:0.19]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [cromwell.jar:0.19]
    at akka.actor.ActorCell.invoke(ActorCell.scala:487) [cromwell.jar:0.19]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [cromwell.jar:0.19]
    at akka.dispatch.Mailbox.run(Mailbox.scala:220) [cromwell.jar:0.19]
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [cromwell.jar:0.19]

It seems the issue is at identifying where the sub-workflow is located. What would be the appropriate means to submit this workflow to gcloud with the sub-workflow. Please let me know if there is any further information I can provide.

-- Derrick DeConti

Tagged:

Best Answer

Answers

  • decontideconti DFCIMember

    Thanks, Geraldine.

    Hopefully that support will come about soon. Meanwhile, I think I can get around the issue by mapping the files to intervals, then scattering over a TSV with the files and the intervals.

Sign In or Register to comment.