Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Failures running VariantRecalibrator

birgerbirger Member, Broadie, CGA-mod ✭✭✭
edited October 9 in Ask the GATK team

We want to run joint germline calling on a set of 122 WES BRCA normal hg19 BAMs from the CPTAC 3 project. We are using the GATK4 workflows showcased in the Terra workspace https://app.terra.bio/#workspaces/help-gatk/Germline-SNPs-Indels-GATK4-b37. We are starting with data that has already been aligned to hg19, so of the three workflows in the showcase workspace, we are running two: haplotypecaller-gvcf-gatk4 and joint-discovery-gatk4. We are encountering problems with the joint-discovery-gatk4 workflow, in particular, in the running of the VariantRecalibrator task. Initially, we are just running on 3 sample gvcfs, recognizing that you need at a minumum 30 exome samples, just to ensure we can run the pipeline. We are using gatk4 v4.1.2.0.

We are getting more or less the same error for both instances of the VariantRecalibrator task...

task instance: JointGenotyping.SNPsVariantRecalibratorClassic:

A USER ERROR has occurred: Couldn't read file file:///cromwell_root/hapmap,known=false,training=true,truth=true,prior=15:/cromwell_root/broad-references/hg19/v0/hapmap_3.3.b37.vcf.gz. Error was: It doesn't exist.

task instance: JointGenotyping.IndelsVariantRecalibrator:

A USER ERROR has occurred: Couldn't read file file:///cromwell_root/mills,known=false,training=true,truth=true,prior=12:/cromwell_root/broad-references/hg19/v0/Mills_and_1000G_gold_standard.indels.b37.sites.vcf. Error was: It doesn't exist.

Here is the java command line (from the task log file in Terra):

Using GATK jar /gatk/gatk-package-4.1.2.0-local.jar

Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx24g -Xms24g -jar /gatk/gatk-package-4.1.2.0-local.jar VariantRecalibrator -V /cromwell_root/fc-secure-823808d0-5404-49c9-990f-b3d9e353e468/02fdb905-0a50-47d5-9a0b-8abb8d0a9636/JointGenotyping/71c87f4b-5e0e-40bc-9b61-71a5e52ac82a/call-SitesOnlyGatherVcf/CBB_Test.sites_only.vcf.gz -O CBB_Test.indels.recal --tranches-file CBB_Test.indels.tranches --trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.5 -tranche 99.0 -tranche 97.0 -tranche 96.0 -tranche 95.0 -tranche 94.0 -tranche 93.5 -tranche 93.0 -tranche 92.0 -tranche 91.0 -tranche 90.0 -an FS -an ReadPosRankSum -an MQRankSum -an QD -an SOR -an DP -mode INDEL --max-gaussians 4 -resource mills,known=false,training=true,truth=true,prior=12:/cromwell_root/broad-references/hg19/v0/Mills_and_1000G_gold_standard.indels.b37.sites.vcf -resource axiomPoly,known=false,training=true,truth=false,prior=10:/cromwell_root/broad-references/hg19/v0/Axiom_Exome_Plus.genotypes.all_populations.poly.vcf.gz -resource dbsnp,known=true,training=false,truth=false,prior=2:/cromwell_root/broad-references/hg19/v0/dbsnp_138.b37.vcf.gz

The problem is clearly with the attributes that prepend the -resource input parameter... they are being interpreted as part of the filename by gatk4.

Post edited by bshifaw on

Answers

  • birgerbirger Member, Broadie, CGA-mod ✭✭✭

    It would be great if we could get an answer to this....we need to complete the germline analysis ahead of a CPTAC face-to-face meeting next week.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @birger,

    We are taking a look at this now and will respond shortly.

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin

    Hi @birger,

    @bshifaw pointed out in the log that you shared that the resource argument isn't being used correctly. For example, it shows-resource mills,... when it should run like this --resource:mills,...
    which is what the 1.1.1 version of the WDL describes in the task.

    Are you using the latest version of the WDL? If you can share your workspace with our group, we can take a closer look too.

  • birgerbirger Member, Broadie, CGA-mod ✭✭✭

    It looks like we are running the wrong version. In our workspace we are running

    Snapshot:
    11
    Source: gatk/joint-discovery-gatk4/11

    While the showcase workspace employs:

    Version:
    1.1.1
    Source: github.com/gatk-workflows/gatk4-germline-snps-indels/joint-discovery-gatk4

    Let me switch to what is used in the showcase workspace and I'll let you know whether that resolves the issues. THANK YOU!

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Administrator, Broadie, Moderator admin
Sign In or Register to comment.