We've moved!
For WDL questions, see the WDL specification and WDL docs.
For Cromwell questions, see the Cromwell docs and please post any issues on Github.

WDL Tutorial Scatter and Gather Does not Work with New Versions of Cromwell/WDL

Hi,

I have been following the WDL Scatter and Gather tutorial located here https://software.broadinstitute.org/wdl/documentation/article?id=7614. I realized some of the commands have changed in GATK4 and modified my script but it seems I can scatter the HaplotypeCaller but I cannot gather and pass the command to WDL. The GATK4 commands work but I cannot get the WDL version to work. I have tried using GenomicsDBImport (Error: A USER ERROR has occurred: GenomicsDB workspace drivingVariantFile:gendb:///home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/cromwell-executions/test/cf118a1a-5c3e-4ab9-884e-a5a33d2cd5e5/call-GenotypeGVCFs/execution/my_database does not exist ) and CombineGVCFs (Error: [2019-04-21 15:40:57,06] [error] BackgroundConfigAsyncJobExecutionActor [5373216etest.CombineGVCFs:NA:1]: Error attempting to Execute
scala.MatchError: null
). I am using cromwell-40 and womtool-40 with gatk-4.1.1.0.

------------GenomicsDBImport----------------------
Here are some of the WDL scripts I have tried:
task HaplotypeCallerERC {
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File bamFile
File bamIndex

command {
java -jar ${GATK} \
HaplotypeCaller \
-ERC GVCF \
-R ${RefFasta} \
-I ${bamFile} \
-O ${sampleName}_rawLikelihoods.g.vcf
}
output {
File GVCF = "${sampleName}_rawLikelihoods.g.vcf"
}
}

task Database{
File GATK
String sampleName
Array[File] GVCFs
command {
java -jar ${GATK} \
GenomicsDBImport \
-V ${sep=" -V GVCFs"} \
-genomicsdb-workspace-path my_database \
-L 20
}
}
task GenotypeGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
command {
java -jar ${GATK} \
GenotypeGVCFs \
-R ${RefFasta} \
-V gendb://my_database \
-O ${sampleName}_rawLikelihoods_cohort.g.vcf
}
output {
File GenotypedVCF="${sampleName}_rawLikelihoods_cohort.g.vcf"
}
}

workflow test {
File inputSamplesFile
Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
File gatk
File refFasta
File refIndex
File refDict
scatter (sample in inputSamples){
call HaplotypeCallerERC{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName=sample[0],
bamFile=sample[1],
bamIndex=sample[2]
}
}
call Database {
input: GATK=gatk,
sampleName="CEUtrio",
GVCFs=HaplotypeCallerERC.GVCF
}
call GenotypeGVCFs{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio"
}
}
----------------------CombineGVCFs--------------------
task HaplotypeCallerERC {
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File bamFile
File bamIndex

command {
java -jar ${GATK} \
HaplotypeCaller \
-ERC GVCF \
-R ${RefFasta} \
-I ${bamFile} \
-O ${sampleName}_rawLikelihoods.g.vcf
}
output {
File GVCF = "${sampleName}_rawLikelihoods.g.vcf"
}
}

task CombineGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
Array[File] GVCFs
command {
java -jar ${GATK} \
CombineGVCFs \
-R ${RefFasta} \
-V ${sep=" -V GVCFs"} \
-O ${sampleName}_combined.vcf
}
output {
File combinedGVCF="${sampleName}_combined.vcf"
}
}

task GenotypeGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File combinedGVCF
command {
java -jar ${GATK} \
GenotypeGVCFs \
-R ${RefFasta} \
-V ${combinedGVCF} \
-O ${sampleName}_rawLikelihoods_cohort.g.vcf
}
output {
File GenotypedVCF="${sampleName}_rawLikelihoods_cohort.g.vcf"
}
}

workflow test {
File inputSamplesFile
Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
File gatk
File refFasta
File refIndex
File refDict
scatter (sample in inputSamples){
call HaplotypeCallerERC{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName=sample[0],
bamFile=sample[1],
bamIndex=sample[2]
}
}
call CombineGVCFs {
input: GVCFs=HaplotypeCallerERC.GVCF,
GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio",
}
call GenotypeGVCFs{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio",
combinedGVCF=CombineGVCFs.combinedGVCF
}
}

Thanks!

Answers

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi Ricardo! I will pass this along for someone to take a look.

  • RicardoHarripaulRicardoHarripaul TorontoMember

    Thank you Tiffany

  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev admin

    Hi Ricardo - I pasted your WDL files into intelliJ and saw an error on the line -V ${sep=" -V GVCFs"} \ - you'll need to supply an array variable to that placeholder (eg -V ${sep=" -V GVCFs" my_variable} \).

  • RicardoHarripaulRicardoHarripaul TorontoMember

    Thanks, ChrisL,
    I simplified the WDL script only to include the combining gVCFs step and included the changes you suggested but I am still getting the same error. I believe I am supplying the Array[File]

    Failed to evaluate input 'GVCFs' (reason 1 of 1): No coercion defined from wom value(s) '[["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12877_rawLikelihoods.g.vcf"], ["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12878_rawLikelihoods.g.vcf"], ["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12882_rawLikelihoods.g.vcf"]]' of type 'Array[Array[File]]' to 'Array[File]'.

    I use am using the full path to the VCF file for inputs file.
    WDL Script:

    task CombineGVCFs{
    File GATK
    File RefFasta
    File RefIndex
    File RefDict
    File sampleName
    Array[File] GVCFs
    command {
    java -jar ${GATK} \
    CombineGVCFs \
    -R ${RefFasta} \
    -V ${sep=" -V GVCFs" GVCFs} \
    -O ${sampleName}_combined.vcf
    }
    output {
    File combinedGVCF="${sampleName}_combined.vcf"
    }
    }

    workflow CombineGVCF {
    File inputSamplesFile
    Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
    File gatk
    File refFasta
    File refIndex
    File refDict

    call CombineGVCFs {
    input:
    GATK=gatk,
    RefFasta=refFasta,
    RefIndex=refIndex,
    RefDict=refDict,
    sampleName="CEUtrio",
    GVCFs=inputSamples
    }
    }

Sign In or Register to comment.