We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
WDL Tutorial Scatter and Gather Does not Work with New Versions of Cromwell/WDL

Hi,
I have been following the WDL Scatter and Gather tutorial located here https://software.broadinstitute.org/wdl/documentation/article?id=7614. I realized some of the commands have changed in GATK4 and modified my script but it seems I can scatter the HaplotypeCaller but I cannot gather and pass the command to WDL. The GATK4 commands work but I cannot get the WDL version to work. I have tried using GenomicsDBImport (Error: A USER ERROR has occurred: GenomicsDB workspace drivingVariantFile:gendb:///home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/cromwell-executions/test/cf118a1a-5c3e-4ab9-884e-a5a33d2cd5e5/call-GenotypeGVCFs/execution/my_database does not exist ) and CombineGVCFs (Error: [2019-04-21 15:40:57,06] [error] BackgroundConfigAsyncJobExecutionActor [5373216etest.CombineGVCFs:NA:1]: Error attempting to Execute
scala.MatchError: null
). I am using cromwell-40 and womtool-40 with gatk-4.1.1.0.
------------GenomicsDBImport----------------------
Here are some of the WDL scripts I have tried:
task HaplotypeCallerERC {
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File bamFile
File bamIndex
command {
java -jar ${GATK} \
HaplotypeCaller \
-ERC GVCF \
-R ${RefFasta} \
-I ${bamFile} \
-O ${sampleName}_rawLikelihoods.g.vcf
}
output {
File GVCF = "${sampleName}_rawLikelihoods.g.vcf"
}
}
task Database{
File GATK
String sampleName
Array[File] GVCFs
command {
java -jar ${GATK} \
GenomicsDBImport \
-V ${sep=" -V GVCFs"} \
-genomicsdb-workspace-path my_database \
-L 20
}
}
task GenotypeGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
command {
java -jar ${GATK} \
GenotypeGVCFs \
-R ${RefFasta} \
-V gendb://my_database \
-O ${sampleName}_rawLikelihoods_cohort.g.vcf
}
output {
File GenotypedVCF="${sampleName}_rawLikelihoods_cohort.g.vcf"
}
}
workflow test {
File inputSamplesFile
Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
File gatk
File refFasta
File refIndex
File refDict
scatter (sample in inputSamples){
call HaplotypeCallerERC{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName=sample[0],
bamFile=sample[1],
bamIndex=sample[2]
}
}
call Database {
input: GATK=gatk,
sampleName="CEUtrio",
GVCFs=HaplotypeCallerERC.GVCF
}
call GenotypeGVCFs{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio"
}
}
----------------------CombineGVCFs--------------------
task HaplotypeCallerERC {
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File bamFile
File bamIndex
command {
java -jar ${GATK} \
HaplotypeCaller \
-ERC GVCF \
-R ${RefFasta} \
-I ${bamFile} \
-O ${sampleName}_rawLikelihoods.g.vcf
}
output {
File GVCF = "${sampleName}_rawLikelihoods.g.vcf"
}
}
task CombineGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
Array[File] GVCFs
command {
java -jar ${GATK} \
CombineGVCFs \
-R ${RefFasta} \
-V ${sep=" -V GVCFs"} \
-O ${sampleName}_combined.vcf
}
output {
File combinedGVCF="${sampleName}_combined.vcf"
}
}
task GenotypeGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File combinedGVCF
command {
java -jar ${GATK} \
GenotypeGVCFs \
-R ${RefFasta} \
-V ${combinedGVCF} \
-O ${sampleName}_rawLikelihoods_cohort.g.vcf
}
output {
File GenotypedVCF="${sampleName}_rawLikelihoods_cohort.g.vcf"
}
}
workflow test {
File inputSamplesFile
Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
File gatk
File refFasta
File refIndex
File refDict
scatter (sample in inputSamples){
call HaplotypeCallerERC{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName=sample[0],
bamFile=sample[1],
bamIndex=sample[2]
}
}
call CombineGVCFs {
input: GVCFs=HaplotypeCallerERC.GVCF,
GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio",
}
call GenotypeGVCFs{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio",
combinedGVCF=CombineGVCFs.combinedGVCF
}
}
Thanks!
Answers
Hi Ricardo! I will pass this along for someone to take a look.
Thank you Tiffany
Hi Ricardo - I pasted your WDL files into intelliJ and saw an error on the line
-V ${sep=" -V GVCFs"} \
- you'll need to supply an array variable to that placeholder (eg-V ${sep=" -V GVCFs" my_variable} \
).Thanks, ChrisL,
I simplified the WDL script only to include the combining gVCFs step and included the changes you suggested but I am still getting the same error. I believe I am supplying the Array[File]
Failed to evaluate input 'GVCFs' (reason 1 of 1): No coercion defined from wom value(s) '[["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12877_rawLikelihoods.g.vcf"], ["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12878_rawLikelihoods.g.vcf"], ["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12882_rawLikelihoods.g.vcf"]]' of type 'Array[Array[File]]' to 'Array[File]'.
I use am using the full path to the VCF file for inputs file.
WDL Script:
task CombineGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
File sampleName
Array[File] GVCFs
command {
java -jar ${GATK} \
CombineGVCFs \
-R ${RefFasta} \
-V ${sep=" -V GVCFs" GVCFs} \
-O ${sampleName}_combined.vcf
}
output {
File combinedGVCF="${sampleName}_combined.vcf"
}
}
workflow CombineGVCF {
File inputSamplesFile
Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
File gatk
File refFasta
File refIndex
File refDict
call CombineGVCFs {
input:
GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio",
GVCFs=inputSamples
}
}