We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
How to identify different members of scatter output array?

I'm trying to run more commands on the output array of a scatter function. How can I call each file of the array separately as an input for the following function? All what I could find is how to have the array output gathered as a single input, using different formats of "sep" syntax.
Best Answer
-
It seems you have not yet embraced the full power of
the dark side of the FORCEcromwell. It is cromwells task to keep track of different files, so that you as a user don't have to mess about with sample numbers. This is an excerpt of the workflow I use to trim and map reads.workflow trimAndMap { File inputFastqFile Int nrCores Array[Array[File]] inputFastq = read_tsv(inputFastqFile) scatter (sample in inputFastq) { call trimmomatic { input: samplename = sample[0], forward = sample[1], reverse = sample[2], nrCores = nrCores } call map { input: forward = trimmomatic.forward, reverse = trimmomatic.reverse, samplename = sample[0], nrCores = nrCores } } }
This way, cromwell will make sure that the name
samplename1
will stay associated with the correct forward and reverse files, even after trimming. I then use thesamplename
in the output filename for the mapping, but that is only so I can easily recognize the files after the analysis has ran. I never use that filename within cromwell or wdl to identify which files belong to which sample. That is something that cromwell does for me automatically.The inputFastqFile looks like this:
samplename1 /path/to/forward.fastq.gz /path/to/reverse.fastq.gz sample2 /path/to/forward2.fastq.gz /path/to/reverse2.fastq.gz
Answers
The easiest way is to just add those calls to the scatter function as well. eg
Thank you Redmar for the answer, but the workflow I am running is a bit more complicated. Can you please help me with it? The first task is already a pipeline of multiple steps which has output that I need to use with the next task (fastqToSam, then markIlluminaAdapter) followed by (samToFastq, BWA then MergeBamAlignment). The Sample number is very important identifier for different files.
I'm assigning the sample_number from the samples_input txt file, and then using the sample-number in the syntax of picard and GATK tools inputs and outputs. Therefore, the outputs have the ${Sample_number} as part of the output file name.
When I try to run the next task under the same scatter function as you suggested, I need to define the input for the next task when I call it. I can still use the Sample_number as an input, but I can't use the ${Sample_number} as part of the input line (e.g. sample=trim.trimmed_${Sample_number}).
It seems you have not yet embraced the full power of
the dark side of the FORCEcromwell. It is cromwells task to keep track of different files, so that you as a user don't have to mess about with sample numbers. This is an excerpt of the workflow I use to trim and map reads.This way, cromwell will make sure that the name
samplename1
will stay associated with the correct forward and reverse files, even after trimming. I then use thesamplename
in the output filename for the mapping, but that is only so I can easily recognize the files after the analysis has ran. I never use that filename within cromwell or wdl to identify which files belong to which sample. That is something that cromwell does for me automatically.The inputFastqFile looks like this:
Redmar! Thanks a lot. I've just had an awesome taste of cromwell's power. I've never thought that it's possible.
I appreciate your help, can you point me to where I can learn more about
the dark side of the FORCEcromwell's power?I would definitely start with some of the WDL tutorials if you haven't explored them yet. Other resources that could be explored...
Are there specific Cromwell options you're interested to learn about?
Thank you @Ruchi !
I've completed the tutorials; they were really informative. The example WDL is definitely a comprehensive way of learning about the different behaviors of Cromwell.
I'm working on developing pipelines for servers that deal with multiple samples. I'm also trying to implement some loops within analyses, which might make it a bit sophisticated.
I need to know as much as possible about Cromwell to do additional work on parallel computing and enhance the efficiency of pipelines.
Thanks again! I appreciate your kind help.