Forum Login Issue:
Currently the "Log in with Google" button redirects you to a "Page not found." This is an issue that our forum vendors are working on fixing. In the meantime, while on the "Page not found" you can edit the URL to delete the second gatk, firecloud, or wdl (depending on what subforum you are acessing).
ex: https://gatkforums.broadinstitute.org/gatk/gatk/entry/...
Currently the "Log in with Google" button redirects you to a "Page not found." This is an issue that our forum vendors are working on fixing. In the meantime, while on the "Page not found" you can edit the URL to delete the second gatk, firecloud, or wdl (depending on what subforum you are acessing).
ex: https://gatkforums.broadinstitute.org/gatk/gatk/entry/...
scatter-gather multiple paired-end alignment (bwa)

Hi,
I've a bunch of samples that are described for now in a json like the one below. Each samples is split in multiple R1-R2 fastq pairs i.e. one pair per sequencing lane. I want to align each lane separately and add an RG tag. Than merge all lanes in one merge bam file. What will be the best way to do that using wdl. I already wrote the different tasks for alignment and sam to bam. And the downstream analysis using the merged bam (markduplicates, base quality recalibration, etc..). So some help to bridge the gap woul be great!
Thank you
{
"sample": [
{
"sample_name": "Sample_1",
"reads":[
{
"R1": "S1_l1_R1.fastq.gz",
"R2": "S1_l1_R2.fastq.gz",
"lane": "l1"
},
{
"R1": "S1_l2_R1.fastq.gz",
"R2": "S1_l2_R2.fastq.gz",
"lane": "l2"
},
{
"R1": "S1_l3_R1.fastq.gz",
"R2": "S1_l3_R2.fastq.gz",
"lane": "l3"
}
]
},
{
"sample_name": "Sample_2",
"reads":[
{
"R1": "S2_l1_R1.fastq.gz",
"R2": "S2_l1_R2.fastq.gz",
"lane": "l1"
},
{
"R1": "S2_l2_R1.fastq.gz",
"R2": "S2_l2_R2.fastq.gz",
"lane": "l2"
},
{
"R1": "S3_l3_R1.fastq.gz",
"R2": "S3_l3_R2.fastq.gz",
"lane": "l3"
}
]
}]
}
Answers
Here's a figure to explain what I want to do :

You might be looking for the batch function for cromwell, with it you can submit several different jobs with one command. I've never used it myself so I cannot show how to do it, but hopefully the documentation is helpful: http://cromwell.readthedocs.io/en/develop/api/RESTAPI/#submit-a-batch-of-workflows-for-execution
I also want to toot my own horn and suggest that you can perhaps use my pipeline as inspiration for your solution, it definitely cannot handle exactly what you want, but there's an easy way of defining the input files and creating the scatter gather operation.
To begin with, here's the repository with the code: https://github.com/oskarvid/wdl_germline_pipeline/
Take a look at the sample input file here: https://github.com/oskarvid/wdl_germline_pipeline/blob/master/intervals/template_sample_manifest.tsv
I've pasted the relevant line below:
As you can see you define the read group and file paths in this file, and if you look at the scatter code here: https://github.com/oskarvid/wdl_germline_pipeline/blob/master/germlinevarcall.wdl#L49 you'll understand how the read group information in the sample file is used in the wdl script. But the pipeline isn't designed to handle multiple samples, only multiple input files for one sample, so it doesn't do everything you want it to. And I think that if you want to be able to handle multiple samples, you'll be better off using the batch execution function.