Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

[WDL][Cromwell] Submitting a workflow with a subworkflow to the cloud.

decontideconti DFCIMember

Hi,

I am having an issue with how to appropriately submit a WDL workflow to the cloud when a subworkflow is involved, which I have not been able to find anything in the forums or spec sheets to explain this. I am attempting to run a variant calling workflow across a large cohort of whole exome data. The workflow is constructed as a scatter-gather of the individual samples. However, to parallelize the variant calling step with HaplotypeCaller (with intervals), I needed a nested scatter-gather. To do this, a subworkflow was invoked.

Currently, I am submitting the jobs with the following command:

gcloud alpha genomics pipelines run \
  --pipeline-file wdl_pipeline.yaml \
  --zones us-east1-b \
  --logging gs://dfci-testgenomes/logging \
  --inputs-from-file WDL=VariantCalling.cloud.wdl  \
  --inputs-from-file WORKFLOW_INPUTS=VariantCalling.cloud.inputs.json \
  --inputs-from-file WORKFLOW_OPTIONS=VariantCalling.cloud.options.json \
  --inputs WORKSPACE=gs://dfci-testgenomes/workspace \
  --inputs OUTPUTS=gs://dfci-testgenomes/outputs

The following files are located in the same directory as the invocation of the above command:
1)VariantCalling.cloud.wdl
2)VariantCalling.cloud.inputs.json
3)VariantCalling.cloud.options.json
4)subHaplotypeCaller.cloud.wdl

subHaplotypeCaller.cloud.wdl is my sub-workflow. In my main workflow (VariantCalling.cloud.wdl), it is imported and called as follows:

import "subHaplotypeCaller.cloud.wdl" as HaplotypeCaller
...
call HaplotypeCaller.HaplotypeCallerAndGatherVCFs {
    input:
        input_bam = ApplyBQSR.recalibrated_bam,
        input_bam_index = ApplyBQSR.recalibrated_bam_index,
        ref_fasta = ref_fasta,
        ref_fasta_index = ref_fasta_index,
        ref_dict = ref_dict,
        gvcf_basename = inputs[1],
        scattered_calling_intervals = scattered_calling_intervals
}

However, I get an error from Cromwell upon submission that reads as:

2017-03-06 22:30:52,742 cromwell-system-akka.actor.default-dispatcher-7 ERROR - WorkflowManagerActor: Workflow failed submission: Workflow input processing failed.
Unable to load namespace from workflow: /wdl_runner/subHaplotypeCaller.cloud.wdl
cromwell.engine.workflow.MaterializeWorkflowDescriptorActor$$anonfun$receive$1$$anon$1: Workflow input processing failed.
Unable to load namespace from workflow: /wdl_runner/subHaplotypeCaller.cloud.wdl
    at cromwell.engine.workflow.MaterializeWorkflowDescriptorActor$$anonfun$receive$1.applyOrElse(MaterializeWorkflowDescriptorActor.scala:69) ~[cromwell.jar:0.19]
    at akka.actor.Actor$class.aroundReceive(Actor.scala:467) ~[cromwell.jar:0.19]
    at cromwell.engine.workflow.MaterializeWorkflowDescriptorActor.aroundReceive(MaterializeWorkflowDescriptorActor.scala:59) ~[cromwell.jar:0.19]
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [cromwell.jar:0.19]
    at akka.actor.ActorCell.invoke(ActorCell.scala:487) [cromwell.jar:0.19]
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [cromwell.jar:0.19]
    at akka.dispatch.Mailbox.run(Mailbox.scala:220) [cromwell.jar:0.19]
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [cromwell.jar:0.19]
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [cromwell.jar:0.19]

It seems the issue is at identifying where the sub-workflow is located. What would be the appropriate means to submit this workflow to gcloud with the sub-workflow. Please let me know if there is any further information I can provide.

-- Derrick DeConti

Tagged:

Best Answer

Answers

  • decontideconti DFCIMember

    Thanks, Geraldine.

    Hopefully that support will come about soon. Meanwhile, I think I can get around the issue by mapping the files to intervals, then scattering over a TSV with the files and the intervals.

Sign In or Register to comment.