Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

WDL Tutorial Scatter and Gather Does not Work with New Versions of Cromwell/WDL

Hi,

I have been following the WDL Scatter and Gather tutorial located here https://software.broadinstitute.org/wdl/documentation/article?id=7614. I realized some of the commands have changed in GATK4 and modified my script but it seems I can scatter the HaplotypeCaller but I cannot gather and pass the command to WDL. The GATK4 commands work but I cannot get the WDL version to work. I have tried using GenomicsDBImport (Error: A USER ERROR has occurred: GenomicsDB workspace drivingVariantFile:gendb:///home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/cromwell-executions/test/cf118a1a-5c3e-4ab9-884e-a5a33d2cd5e5/call-GenotypeGVCFs/execution/my_database does not exist ) and CombineGVCFs (Error: [2019-04-21 15:40:57,06] [error] BackgroundConfigAsyncJobExecutionActor [5373216etest.CombineGVCFs:NA:1]: Error attempting to Execute
scala.MatchError: null
). I am using cromwell-40 and womtool-40 with gatk-4.1.1.0.

------------GenomicsDBImport----------------------
Here are some of the WDL scripts I have tried:
task HaplotypeCallerERC {
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File bamFile
File bamIndex

command {
java -jar ${GATK} \
HaplotypeCaller \
-ERC GVCF \
-R ${RefFasta} \
-I ${bamFile} \
-O ${sampleName}_rawLikelihoods.g.vcf
}
output {
File GVCF = "${sampleName}_rawLikelihoods.g.vcf"
}
}

task Database{
File GATK
String sampleName
Array[File] GVCFs
command {
java -jar ${GATK} \
GenomicsDBImport \
-V ${sep=" -V GVCFs"} \
-genomicsdb-workspace-path my_database \
-L 20
}
}
task GenotypeGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
command {
java -jar ${GATK} \
GenotypeGVCFs \
-R ${RefFasta} \
-V gendb://my_database \
-O ${sampleName}_rawLikelihoods_cohort.g.vcf
}
output {
File GenotypedVCF="${sampleName}_rawLikelihoods_cohort.g.vcf"
}
}

workflow test {
File inputSamplesFile
Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
File gatk
File refFasta
File refIndex
File refDict
scatter (sample in inputSamples){
call HaplotypeCallerERC{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName=sample[0],
bamFile=sample[1],
bamIndex=sample[2]
}
}
call Database {
input: GATK=gatk,
sampleName="CEUtrio",
GVCFs=HaplotypeCallerERC.GVCF
}
call GenotypeGVCFs{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio"
}
}
----------------------CombineGVCFs--------------------
task HaplotypeCallerERC {
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File bamFile
File bamIndex

command {
java -jar ${GATK} \
HaplotypeCaller \
-ERC GVCF \
-R ${RefFasta} \
-I ${bamFile} \
-O ${sampleName}_rawLikelihoods.g.vcf
}
output {
File GVCF = "${sampleName}_rawLikelihoods.g.vcf"
}
}

task CombineGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
Array[File] GVCFs
command {
java -jar ${GATK} \
CombineGVCFs \
-R ${RefFasta} \
-V ${sep=" -V GVCFs"} \
-O ${sampleName}_combined.vcf
}
output {
File combinedGVCF="${sampleName}_combined.vcf"
}
}

task GenotypeGVCFs{
File GATK
File RefFasta
File RefIndex
File RefDict
String sampleName
File combinedGVCF
command {
java -jar ${GATK} \
GenotypeGVCFs \
-R ${RefFasta} \
-V ${combinedGVCF} \
-O ${sampleName}_rawLikelihoods_cohort.g.vcf
}
output {
File GenotypedVCF="${sampleName}_rawLikelihoods_cohort.g.vcf"
}
}

workflow test {
File inputSamplesFile
Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
File gatk
File refFasta
File refIndex
File refDict
scatter (sample in inputSamples){
call HaplotypeCallerERC{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName=sample[0],
bamFile=sample[1],
bamIndex=sample[2]
}
}
call CombineGVCFs {
input: GVCFs=HaplotypeCallerERC.GVCF,
GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio",
}
call GenotypeGVCFs{
input: GATK=gatk,
RefFasta=refFasta,
RefIndex=refIndex,
RefDict=refDict,
sampleName="CEUtrio",
combinedGVCF=CombineGVCFs.combinedGVCF
}
}

Thanks!

Answers

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi Ricardo! I will pass this along for someone to take a look.

  • RicardoHarripaulRicardoHarripaul TorontoMember

    Thank you Tiffany

  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev admin

    Hi Ricardo - I pasted your WDL files into intelliJ and saw an error on the line -V ${sep=" -V GVCFs"} \ - you'll need to supply an array variable to that placeholder (eg -V ${sep=" -V GVCFs" my_variable} \).

  • RicardoHarripaulRicardoHarripaul TorontoMember

    Thanks, ChrisL,
    I simplified the WDL script only to include the combining gVCFs step and included the changes you suggested but I am still getting the same error. I believe I am supplying the Array[File]

    Failed to evaluate input 'GVCFs' (reason 1 of 1): No coercion defined from wom value(s) '[["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12877_rawLikelihoods.g.vcf"], ["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12878_rawLikelihoods.g.vcf"], ["/home/ricardo/Downloads/gatk-4.1.1.0/wdl/helloHaplotypeCaller/helloHaplotypeCaller/NA12882_rawLikelihoods.g.vcf"]]' of type 'Array[Array[File]]' to 'Array[File]'.

    I use am using the full path to the VCF file for inputs file.
    WDL Script:

    task CombineGVCFs{
    File GATK
    File RefFasta
    File RefIndex
    File RefDict
    File sampleName
    Array[File] GVCFs
    command {
    java -jar ${GATK} \
    CombineGVCFs \
    -R ${RefFasta} \
    -V ${sep=" -V GVCFs" GVCFs} \
    -O ${sampleName}_combined.vcf
    }
    output {
    File combinedGVCF="${sampleName}_combined.vcf"
    }
    }

    workflow CombineGVCF {
    File inputSamplesFile
    Array[Array[File]] inputSamples = read_tsv(inputSamplesFile)
    File gatk
    File refFasta
    File refIndex
    File refDict

    call CombineGVCFs {
    input:
    GATK=gatk,
    RefFasta=refFasta,
    RefIndex=refIndex,
    RefDict=refDict,
    sampleName="CEUtrio",
    GVCFs=inputSamples
    }
    }

Sign In or Register to comment.