We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
Inconsistency in file paths when pipelining tasks

Hello,
I am attempting to pass output of a task call to a next task in my pipeline and I have been struggling with getting the input paths right.
My .wdl looks like this:
task pool_and_pseudoreplicate_complex { File tags_rep1 File tags_rep2 File tags_ctrl1 File tags_ctrl2 String rep1_paired_end String rep2_paired_end command { python /image_software/pipeline-container/src/pool_and_pseudoreplicate.py ${tags_rep1} ${tags_ctrl1} ${rep1_paired_end} ${tags_rep2} ${tags_ctrl2} ${rep2_paired_end} } output { Array[File] out_files = glob('*.gz') File results = glob('pool_and_pseudoreplicate_outfiles.mapping')[0] } runtime { docker: 'quay.io/ottojolanki/pool_and_pseudoreplicate:v1.11' cpu: '1' memory: '4.0 GB' disks: 'local-disk 30 HDD' } } task pool_and_pseudoreplicate_simple { File tags_rep1 File tags_ctrl1 String rep1_paired_end command { python /image_software/pipeline-container/src/pool_and_pseudoreplicate.py ${tags_rep1} ${tags_ctrl1} ${rep1_paired_end} } output { File rep1_pr1 = glob('*pr1.tagAlign.gz')[0] File rep1_pr2 = glob('*pr2.tagAlign.gz')[0] File results = glob('pool_and_pseudoreplicate_outfiles.mapping')[0] } runtime { docker: 'quay.io/ottojolanki/pool_and_pseudoreplicate:v1.11' cpu: '1' memory: '4.0 GB' disks: 'local-disk 30 HDD' } } task xcor { File tags String paired_end command { python /image_software/pipeline-container/src/xcor_only.py ${tags} ${paired_end} } output { File xcor_scores = glob('*.cc.qc')[0] File xcor_plot = glob('*.cc.plot.pdf')[0] } runtime { docker: 'quay.io/ottojolanki/xcor_only:test3' cpu: '1' memory: '4.0GB' disks: 'local-disk 30 HDD' } } task output_defined { File is_this_def File is_this_def2 String paired_end command { echo "the input is defined!" echo ${is_this_def} echo ${is_this_def2} echo ${paired_end} } runtime { docker: 'ubuntu:latest' cpu: '1' memory: '4.0GB' disks: 'local-disk 30 HDD' } } #WORKFLOW DEFINITION workflow pool_and_pseudoreplicate_workflow { File tags_rep1 File? tags_rep2 File tags_ctrl1 File? tags_ctrl2 String rep1_paired_end String? rep2_paired_end #String genomesize #File chrom_sizes #File narrowpeak_as #File gappedpeak_as #File broadpeak_as if(defined(tags_rep2)){ call pool_and_pseudoreplicate_complex { input: tags_rep1=tags_rep1, tags_rep2=tags_rep2, tags_ctrl1=tags_ctrl1, tags_ctrl2=tags_ctrl2, rep1_paired_end=rep1_paired_end, rep2_paired_end=rep2_paired_end } } if(!defined(tags_rep2)){ call pool_and_pseudoreplicate_simple { input: tags_rep1=tags_rep1, tags_ctrl1=tags_ctrl1, rep1_paired_end=rep1_paired_end } call output_defined { input: is_this_def=pool_and_pseudoreplicate_simple.rep1_pr1, is_this_def2=pool_and_pseudoreplicate_simple.rep1_pr2, paired_end=rep1_paired_end } call xcor { input: tags = pool_and_pseudoreplicate_simple.rep1_pr1, paired_end = rep1_paired_end } } }
And my inputs .json is:
{ "pool_and_pseudoreplicate_workflow.tags_rep1": "rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.tagAlign.gz", "pool_and_pseudoreplicate_workflow.tags_ctrl1": "ctl1_chr21.raw.srt.filt.srt.nodup.PE2SE.tagAlign.gz", "pool_and_pseudoreplicate_workflow.rep1_paired_end": "False" }
I run cromwell 28 in the working directory
/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/
where I have both the .wdl .json and the input files that are considered.
The first task does some simple subsampling, and outputs the result files. It works correctly. I was having trouble getting the subsequent xcor task to work, and thus for debugging purposes added the task to check that the output actually can be passed along. When I run my workflow:
DN0a22f0dd:pool_and_pseudoreplicate_test_data otto$ java -jar cromwell-28_2.jar run pool_and_pseudoreplicate_workflow.wdl rep_inputs_simple.json
in addition to other output there are some lines that confuse me. The commands run in output_defined are(correctly):
[2017-08-25 10:38:34,06] [info] BackgroundConfigAsyncJobExecutionActor [aa3ba195pool_and_pseudoreplicate_workflow.output_defined:NA:1]: echo "the input is defined!" echo /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-output_defined/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-pool_and_pseudoreplicate_simple/execution/glob-aefc71437f2745efd61690b3747de0b1/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz echo /Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-output_defined/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-pool_and_pseudoreplicate_simple/execution/glob-68b15494bd1ce67f1f051918d3136843/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr2.tagAlign.gz echo False
while the input to the xcor task call seems to get cut for some reason:
python /image_software/pipeline-container/src/xcor_only.py /cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-xcor/inputs/Users/otto/github/pipeline-container/local-workflows/pool_and_pseudoreplicate_test_data/cromwell-executions/pool_and_pseudoreplicate_workflow/aa3ba195-c676-4d0a-8255-a03905de56d8/call-pool_and_pseudoreplicate_simple/execution/glob-aefc71437f2745efd61690b3747de0b1/rep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz False
Do any of you have idea why the path of the input files are complete in the first call, and in the second call handled as if the cromwell-executions directory were directly in the root of the filesystem?
Answers
File paths in Dockerized tasks should begin with
/cromwell-executions
, that's the appropriate view of the file layout from inside the container.That path in
xcor
is some bizarre and very wrong chimera. I suspect something is going wrong with theglob
, I'll try to work up a minimal test case for that.Thanks so much for looking at this. My head is getting quite sore from hitting the wall again, and again.
And actually I think I may have been wrong about that chimeric path, that looks to be an intentional duplication of the input directory structure to prevent inputs with the same filenames from colliding within the container.
What exactly is the problem you're seeing with
xcor
?The full output from the run looks like this.
stderr looks like this.
my stdout:
The problem arises by some of the intermediate files getting lost. All of the code I have put in the container runs without problems on a standard ubuntu machine, so I think something with the filepaths go wrong somewhere along the pipe.
The (in state of conversion from former architecture) source code and the Dockerfiles (I wholeheartedly understand that digging into other people's source may not be your preferred Friday afternoon activity) can be found in the repo here: https://github.com/ENCODE-DCC/pipeline-container/tree/xcor_and_macs_workflow Thank you again!
Hi @ojolanki, is that
gz
file supposed to get un-gzipped
once it gets intoxcor
? The only file localized into thexcor
container is calledrep1_chr21.raw.srt.filt.srt.nodup.PE2SE.SE.pr1.tagAlign.gz
. There are errors that sayand
That latter one with suffixes appended to a
gz
file looks particularly questionable.This might be working with non-Dockerized tasks since non-Dockerized tasks can see the files from other non-Dockerized tasks, but Dockerized tasks cannot.
That makes a lot of sense. I will investigate. Thanks for the help.