If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Failures running VariantRecalibrator
We want to run joint germline calling on a set of 122 WES BRCA normal hg19 BAMs from the CPTAC 3 project. We are using the GATK4 workflows showcased in the Terra workspace https://app.terra.bio/#workspaces/help-gatk/Germline-SNPs-Indels-GATK4-b37. We are starting with data that has already been aligned to hg19, so of the three workflows in the showcase workspace, we are running two: haplotypecaller-gvcf-gatk4 and joint-discovery-gatk4. We are encountering problems with the joint-discovery-gatk4 workflow, in particular, in the running of the VariantRecalibrator task. Initially, we are just running on 3 sample gvcfs, recognizing that you need at a minumum 30 exome samples, just to ensure we can run the pipeline. We are using gatk4 v220.127.116.11.
We are getting more or less the same error for both instances of the VariantRecalibrator task...
task instance: JointGenotyping.SNPsVariantRecalibratorClassic:
A USER ERROR has occurred: Couldn't read file file:///cromwell_root/hapmap,known=false,training=true,truth=true,prior=15:/cromwell_root/broad-references/hg19/v0/hapmap_3.3.b37.vcf.gz. Error was: It doesn't exist.
task instance: JointGenotyping.IndelsVariantRecalibrator:
A USER ERROR has occurred: Couldn't read file file:///cromwell_root/mills,known=false,training=true,truth=true,prior=12:/cromwell_root/broad-references/hg19/v0/Mills_and_1000G_gold_standard.indels.b37.sites.vcf. Error was: It doesn't exist.
Here is the java command line (from the task log file in Terra):
Using GATK jar /gatk/gatk-package-18.104.22.168-local.jar
Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx24g -Xms24g -jar /gatk/gatk-package-22.214.171.124-local.jar VariantRecalibrator -V /cromwell_root/fc-secure-823808d0-5404-49c9-990f-b3d9e353e468/02fdb905-0a50-47d5-9a0b-8abb8d0a9636/JointGenotyping/71c87f4b-5e0e-40bc-9b61-71a5e52ac82a/call-SitesOnlyGatherVcf/CBB_Test.sites_only.vcf.gz -O CBB_Test.indels.recal --tranches-file CBB_Test.indels.tranches --trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 97.0 -tranche 96.0 -tranche 95.0 -tranche 94.0 -tranche 93.5 -tranche 93.0 -tranche 92.0 -tranche 91.0 -tranche 90.0 -an FS -an ReadPosRankSum -an MQRankSum -an QD -an SOR -an DP -mode INDEL --max-gaussians 4 -resource mills,known=false,training=true,truth=true,prior=12:/cromwell_root/broad-references/hg19/v0/Mills_and_1000G_gold_standard.indels.b37.sites.vcf -resource axiomPoly,known=false,training=true,truth=false,prior=10:/cromwell_root/broad-references/hg19/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.vcf.gz -resource dbsnp,known=true,training=false,truth=false,prior=2:/cromwell_root/broad-references/hg19/v0/dbsnp_138.b37.vcf.gz
The problem is clearly with the attributes that prepend the -resource input parameter... they are being interpreted as part of the filename by gatk4.