Failures running VariantRecalibrator

birgerbirger Member, Broadie, CGA-mod ✭✭✭
edited October 9 in Ask the GATK team

We want to run joint germline calling on a set of 122 WES BRCA normal hg19 BAMs from the CPTAC 3 project. We are using the GATK4 workflows showcased in the Terra workspace We are starting with data that has already been aligned to hg19, so of the three workflows in the showcase workspace, we are running two: haplotypecaller-gvcf-gatk4 and joint-discovery-gatk4. We are encountering problems with the joint-discovery-gatk4 workflow, in particular, in the running of the VariantRecalibrator task. Initially, we are just running on 3 sample gvcfs, recognizing that you need at a minumum 30 exome samples, just to ensure we can run the pipeline. We are using gatk4 v4.1.2.0.

We are getting more or less the same error for both instances of the VariantRecalibrator task...

task instance: JointGenotyping.SNPsVariantRecalibratorClassic:

A USER ERROR has occurred: Couldn't read file file:///cromwell_root/hapmap,known=false,training=true,truth=true,prior=15:/cromwell_root/broad-references/hg19/v0/hapmap_3.3.b37.vcf.gz. Error was: It doesn't exist.

task instance: JointGenotyping.IndelsVariantRecalibrator:

A USER ERROR has occurred: Couldn't read file file:///cromwell_root/mills,known=false,training=true,truth=true,prior=12:/cromwell_root/broad-references/hg19/v0/Mills_and_1000G_gold_standard.indels.b37.sites.vcf. Error was: It doesn't exist.

Here is the java command line (from the task log file in Terra):

Using GATK jar /gatk/gatk-package-

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx24g -Xms24g -jar /gatk/gatk-package- VariantRecalibrator -V /cromwell_root/fc-secure-823808d0-5404-49c9-990f-b3d9e353e468/02fdb905-0a50-47d5-9a0b-8abb8d0a9636/JointGenotyping/71c87f4b-5e0e-40bc-9b61-71a5e52ac82a/call-SitesOnlyGatherVcf/CBB_Test.sites_only.vcf.gz -O CBB_Test.indels.recal --tranches-file CBB_Test.indels.tranches --trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 97.0 -tranche 96.0 -tranche 95.0 -tranche 94.0 -tranche 93.5 -tranche 93.0 -tranche 92.0 -tranche 91.0 -tranche 90.0 -an FS -an ReadPosRankSum -an MQRankSum -an QD -an SOR -an DP -mode INDEL --max-gaussians 4 -resource mills,known=false,training=true,truth=true,prior=12:/cromwell_root/broad-references/hg19/v0/Mills_and_1000G_gold_standard.indels.b37.sites.vcf -resource axiomPoly,known=false,training=true,truth=false,prior=10:/cromwell_root/broad-references/hg19/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.vcf.gz -resource dbsnp,known=true,training=false,truth=false,prior=2:/cromwell_root/broad-references/hg19/v0/dbsnp_138.b37.vcf.gz

The problem is clearly with the attributes that prepend the -resource input parameter... they are being interpreted as part of the filename by gatk4.

  • birgerbirger Member, Broadie, CGA-mod ✭✭✭

    It would be great if we could get an answer to this....we need to complete the germline analysis ahead of a CPTAC face-to-face meeting next week.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @birger,

    We are taking a look at this now and will respond shortly.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @birger,

    @bshifaw pointed out in the log that you shared that the resource argument isn't being used correctly. For example, it shows-resource mills,... when it should run like this --resource:mills,...
    which is what the 1.1.1 version of the WDL describes in the task.

    Are you using the latest version of the WDL? If you can share your workspace with our group, we can take a closer look too.

  • birgerbirger Member, Broadie, CGA-mod ✭✭✭

    It looks like we are running the wrong version. In our workspace we are running

    Source: gatk/joint-discovery-gatk4/11

    While the showcase workspace employs:


    Let me switch to what is used in the showcase workspace and I'll let you know whether that resolves the issues. THANK YOU!

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin
