Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

Variant Recalibrator syntax

I am unable to find the right syntax for -resource for Variant Recalibrator in GATK v4

In response to the following error message,

A USER ERROR has occurred: No training set found! Please provide sets of known polymorphic loci marked with the training=true feature input tag. For example, -resource hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmapFile.vcf

I entered this code:

gatk VariantRecalibrator \
-R chr20.fa \
-V case_SNPs.vcf \
-resource hapmap,VCF,known=false,training=true,truth=true,prior=12.0 hapmap.vcf \
-an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
-mode SNP \
--max-gaussians 4 \
-O "case/case_cohort.recal" \
--tranches-file "case/case_cohort.tranches" \
--rscript-file "case/case_cohort.plots"

Though this seems to resemble the recommended syntax, I get an error:

A USER ERROR has occurred: Invalid argument 'hapmap.vcf'.

Best Answers

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @skngs

    You missed a colon ':' in your resource argument. The correct syntax for it is --resource hapmap,VCF,known=false,training=true,truth=true,prior=12.0:hapmap.vcf

    Hope this helps

    Regards
    Bhanu

  • skngsskngs Member

    Thank you for your prompt reply @bhanuGandham, but this does not work for me. When I try adding the colon, I get:

    A USER ERROR has occurred: Argument resource has a bad value: hapmap,VCF,known=false,training=true,truth=true,prior=12.0:hapmap.vcf. Problem constructing FeatureInput from the string 'hapmap,VCF,known=false,training=true,truth=true,prior=12.0:hapmap.vcf'.

  • skngsskngs Member

    This works, thank you!

  • merajmeraj IndiaMember

    Hi,
    I am unable to run analysis for variant recalibration using GATK. I am posting the script as below:

    java -jar gatk-package-4.0.9.0-local.jar \
    VariantRecalibrator \
    -R path_to_hum_ref/hg38_file.fasta
    -V combined_GRC_mak1_hg38.finalvars.vcf \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
    -mode SNP \
    --max-gaussians 4 \
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.vcf.gz \
    --resource omni,known=false,training=true,truth=false,prior=12.0:1000G_omni2.5.hg38.vcf.gz \
    --resource 1000G,known=false,training=true,truth=false,prior=10.0:1000G_phase1.snps.high_confidence.hg38.vcf.gz \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:Homo_sapiens_assembly38.dbsnp138.vcf \
    -recal-file GRC_mak1_hg38_vqsrsnp.recal \
    --tranches-file GRC_mak1_hg38_vqsrsnp.tranches \
    --rscript-file GRC_mak1_hg38_vqsrsnp.plots.R

    I am getting the following error:
    A USER ERROR has occurred: r is not a recognized option
    Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

    Please help me to resolve the issue.
    Thanks and regards,
    Meraj

  • skngsskngs Member

    Hi @meraj,

    Try specifying the output recalibration file using -O instead of --recal-file

  • merajmeraj IndiaMember

    Hi @skngs ,

    Now the error reads as below:

    A USER ERROR has occurred: Argument resource has a bad value: 1000G,known=false,training=true,truth=false,prior=10.0:1000G_phase1.snps.high_confidence.hg38.vcf.gz. Problem constructing FeatureInput from the string '1000G,known=false,training=true,truth=false,prior=10.0:1000G_phase1.snps.high_confidence.hg38.vcf.gz'.

  • merajmeraj IndiaMember

    @bhanuGandham
    Hi, I changed the command as below and it works now:

    java -jar gatk-package-4.0.9.0-local.jar \
    VariantRecalibrator \
    -R path_to_hum_ref/hg38_file.fasta
    -V combined_GRC_mak1_hg38.finalvars.vcf \
    -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR \
    -mode SNP \
    --max-gaussians 4 \
    --resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.vcf.gz \
    --resource omni,known=false,training=true,truth=false,prior=12.0:omni2.5.hg38.vcf.gz \
    --resource thouzndG,known=false,training=true,truth=false,prior=10.0:thouzndG_phase1.snps.high_confidence.hg38.vcf.gz \
    --resource dbsnp,known=true,training=false,truth=false,prior=2.0:Homo_sapiens_assembly38.dbsnp138.vcf \
    -recal-file GRC_mak1_hg38_vqsrsnp.recal \
    --tranches-file GRC_mak1_hg38_vqsrsnp.tranches \
    --rscript-file GRC_mak1_hg38_vqsrsnp.plots.R

    Thanks.

    Best,
    Meraj

Sign In or Register to comment.