Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Variant Recalibrator syntax

I am unable to find the right syntax for -resource for Variant Recalibrator in GATK v4

In response to the following error message,

A USER ERROR has occurred: No training set found! Please provide sets of known polymorphic loci marked with the training=true feature input tag. For example, -resource hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmapFile.vcf

I entered this code:

gatk VariantRecalibrator \
-R chr20.fa \
-V case_SNPs.vcf \
-resource hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
-mode SNP \
--max-gaussians 4 \
-O "case/case_cohort.recal" \
--tranches-file "case/case_cohort.tranches" \
--rscript-file "case/case_cohort.plots"

Though this seems to resemble the recommended syntax, I get an error:

A USER ERROR has occurred: Invalid argument 'hapmap.vcf'.

Best Answers

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @skngs

    You missed a colon ':' in your resource argument. The correct syntax for it is --resource hapmap,VCF,known=false,training=true,truth=true,prior=12.0:hapmap.vcf

    Hope this helps

    Regards
    Bhanu

  • skngsskngs Member

    Thank you for your prompt reply @bhanuGandham, but this does not work for me. When I try adding the colon, I get:

    A USER ERROR has occurred: Argument resource has a bad value: hapmap,VCF,known=false,training=true,truth=true,prior=12.0:hapmap.vcf. Problem constructing FeatureInput from the string 'hapmap,VCF,known=false,training=true,truth=true,prior=12.0:hapmap.vcf'.

  • skngsskngs Member

    This works, thank you!

  • merajmeraj IndiaMember

    Hi,
    I am unable to run analysis for variant recalibration using GATK. I am posting the script as below:

    java -jar gatk-package-4.0.9.0-local.jar \
    VariantRecalibrator \
    -R path_to_hum_ref/hg38_file.fasta
    -V combined_GRC_mak1_hg38.finalvars.vcf \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
    -mode SNP \
    --max-gaussians 4 \
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.vcf.gz \
    --resource omni,known=false,training=true,truth=false,prior=12.0:1000G_omni2.5.hg38.vcf.gz \
    --resource 1000G,known=false,training=true,truth=false,prior=10.0:1000G_phase1.snps.high_confidence.hg38.vcf.gz \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:Homo_sapiens_assembly38.dbsnp138.vcf \
    -recal-file GRC_mak1_hg38_vqsrsnp.recal \
    --tranches-file GRC_mak1_hg38_vqsrsnp.tranches \
    --rscript-file GRC_mak1_hg38_vqsrsnp.plots.R

    I am getting the following error:
    A USER ERROR has occurred: r is not a recognized option
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    Please help me to resolve the issue.
    Thanks and regards,
    Meraj

  • skngsskngs Member

    Hi @meraj,

    Try specifying the output recalibration file using -O instead of --recal-file

  • merajmeraj IndiaMember

    Hi @skngs ,

    Now the error reads as below:

    A USER ERROR has occurred: Argument resource has a bad value: 1000G,known=false,training=true,truth=false,prior=10.0:1000G_phase1.snps.high_confidence.hg38.vcf.gz. Problem constructing FeatureInput from the string '1000G,known=false,training=true,truth=false,prior=10.0:1000G_phase1.snps.high_confidence.hg38.vcf.gz'.

  • merajmeraj IndiaMember

    @bhanuGandham
    Hi, I changed the command as below and it works now:

    java -jar gatk-package-4.0.9.0-local.jar \
    VariantRecalibrator \
    -R path_to_hum_ref/hg38_file.fasta
    -V combined_GRC_mak1_hg38.finalvars.vcf \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
    -mode SNP \
    --max-gaussians 4 \
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.vcf.gz \
    --resource omni,known=false,training=true,truth=false,prior=12.0:omni2.5.hg38.vcf.gz \
    --resource thouzndG,known=false,training=true,truth=false,prior=10.0:thouzndG_phase1.snps.high_confidence.hg38.vcf.gz \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:Homo_sapiens_assembly38.dbsnp138.vcf \
    -recal-file GRC_mak1_hg38_vqsrsnp.recal \
    --tranches-file GRC_mak1_hg38_vqsrsnp.tranches \
    --rscript-file GRC_mak1_hg38_vqsrsnp.plots.R

    Thanks.

    Best,
    Meraj

Sign In or Register to comment.