Problem in a task with piped command

Dear all in the WDL team,
I am trying to put a workflow of three tasks as recommended in the GATK best practice (to start from fastq files till the aligned bam file). I tried to execute the steps in the command line separately before and they worked, but when I try to combine them into one WDL script I keep getting an error message because of the last step (the piped one) and I guess it is because I don't know how to use the /dev/stdout and /dev/stdin.
The error message is: "Could not process output, file not found: /home/projects/cu_10111/data/Test/TOY_piped.bam
java.lang.RuntimeException: Could not process output, file not found: /home/projects/cu_10111/data/Test/TOY_piped.bam"
The strange thing is that this missing file (TOY_piped.bam) should be the last output and I don't understand why it should be existed before. I tried to add an option file to solve this issue but it is not working.
The WDL script, the inputs.json and the option.json are the following (respectively):
script.wdl
workflow FromFastqToVCF {
File FASTQ1
File FASTQ2
String SAMPLENAME
File REFFASTA
File REFINDEX
File REFDICT
call FastqToSam {
input:
FastqR1=FASTQ1,
FastqR2=FASTQ2,
SampleName=SAMPLENAME
}
call MarkIlluminaAdapters {
input:
SampleName=SAMPLENAME,
uBAM=FastqToSam.uBAM
}
call AllignedBAM {
input:
mBAM=MarkIlluminaAdapters.mBAM,
refFasta=REFFASTA,
uBAM=FastqToSam.uBAM,
SampleName=SAMPLENAME
}
}

task FastqToSam {
File FastqR1
File FastqR2
String SampleName
command {
gatk FastqToSam \
--FASTQ "${FastqR1}" \
--FASTQ2 "${FastqR2}" \
--OUTPUT "/home/projects/cu_10111/data/Test/${SampleName}_fastqtosam.bam" \
--SAMPLE_NAME "${SampleName}"
}
output {
File uBAM = "/home/projects/cu_10111/data/Test/${SampleName}_fastqtosam.bam"
}

}

task MarkIlluminaAdapters {
File uBAM
String SampleName
command {
gatk MarkIlluminaAdapters \
--INPUT "${uBAM}" \
--METRICS "/home/projects/cu_10111/data/Test/${SampleName}_markilluminaadapters_metrics.txt" \
--OUTPUT "/home/projects/cu_10111/data/Test/${SampleName}_markilluminaadapters.bam"
}
output {
File mBAM = "/home/projects/cu_10111/data/Test/${SampleName}_markilluminaadapters.bam"
}

}

task AllignedBAM {
File mBAM
String SampleName
File refFasta
File refIndex
File refDict
File uBAM
command {
set -o pipefail
gatk SamToFastq \
--INPUT ${mBAM} \
--FASTQ /dev/stdout \
--CLIPPING_ATTRIBUTE XT --CLIPPING_ACTION 2 --INTERLEAVE true --INCLUDE_NON_PF_READS true \
--TMP_DIR /home/projects/cu_10111/data/Test/temp
| \
bwa mem -M -t 31 -p ${refFasta} /dev/stdin \
| \
gatk MergeBamAlignment \
--REFERENCE_SEQUENCE ${refFasta} \
--UNMAPPED_BAM ${uBAM} \
--ALIGNED_BAM /dev/stdin \
--CREATE_INDEX true --ADD_MATE_CIGAR true --CLIP_ADAPTERS false --CLIP_OVERLAPPING_READS true \
--INCLUDE_SECONDARY_ALIGNMENTS true --MAX_INSERTIONS_OR_DELETIONS -1 --PRIMARY_ALIGNMENT_STRATEGY MostDistant \
--ATTRIBUTES_TO_RETAIN XS \
--OUTPUT /home/projects/cu_10111/data/Test/${SampleName}_piped.bam
--TMP_DIR /home/projects/cu_10111/data/Test/temp
}
output {
File BAM = "/home/projects/cu_10111/data/Test/${SampleName}_piped.bam"
}
}

inputs.json
{
"FromFastqToVCF.AllignedBAM.refDict": "/home/projects/cu_10111/data/Test/Homo_sapiens_assembly19.dict",
"FromFastqToVCF.AllignedBAM.refIndex": "/home/projects/cu_10111/data/Test/Homo_sapiens_assembly19.fasta.fai",
"FromFastqToVCF.SAMPLENAME": "TOY",
"FromFastqToVCF.REFINDEX": "/home/projects/cu_10111/data/Test/Homo_sapiens_assembly19.fasta.fai",
"FromFastqToVCF.FASTQ2": "/home/projects/cu_10111/data/Test/TOY_S1_L001_R2_001.fastq.gz",
"FromFastqToVCF.REFDICT": "/home/projects/cu_10111/data/Test/Homo_sapiens_assembly19.dict",
"FromFastqToVCF.FASTQ1": "/home/projects/cu_10111/data/Test/TOY_S1_L001_R1_001.fastq.gz",
"FromFastqToVCF.REFFASTA": "/home/projects/cu_10111/data/Test/Homo_sapiens_assembly19.fasta"
}

options.json
{
"default_runtime_attributes": {
"continueOnReturnCode": true
},
"workflow_failure_mode": "ContinueWhilePossible",
"write_to_cache": true,
"read_from_cache": true
}

I will appreciate any help specially that I am not a linux person and this is all knew to me.
Best
Nawar

Best Answer

Answers

  • NawarDalilaNawarDalila Member

    Thank you Chris. What you said was definitely part of the problem. The other part is the bwa tool in WDL. I realized that I should have declared all the relevant files for the indexing (5 files) which one doesn't use usually per say.

Sign In or Register to comment.