Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator error!

Hi,

I was trying to run the Variant Recalibrator (https://www.broadinstitute.org/gatk/guide/article?id=1259) using the command:

java -Xmx4g -jar ~/GATK-3.3-0/GenomeAnalysisTK.jar -T VariantRecalibrator -R ~/GATK-3.3-0/bundle/2.8/hg19/ucsc.hg19.fasta -input SL545_raw.snps.indels.g.vcf -recalFile ./output.recal -tranchesFile ./output.tranches -nt 8 -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ~/GATK-3.3-0/bundle/2.8/hg19/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 ~/GATK-3.3-0/bundle/2.8/hg19/1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 ~/GATK-3.3-0/bundle/2.8/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ~/GATK-3.3-0/bundle/2.8/hg19/dbsnp_138.hg19.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -an InbreedingCoeff -mode SNP

However, I get the following error:

======================================

INFO 08:03:05,529 ProgressMeter - chrX:120229601 2.997049857E9 55.0 m 1.0 s 95.7% 57.5 m 2.5 m
INFO 08:03:35,534 ProgressMeter - chrY:5293901 3.040320417E9 55.5 m 1.0 s 97.0% 57.3 m 104.0 s
INFO 08:04:05,536 ProgressMeter - chrY:23461301 3.059320417E9 56.0 m 1.0 s 97.5% 57.5 m 85.0 s
INFO 08:04:35,540 ProgressMeter - chrY:40999901 3.076320417E9 56.5 m 1.0 s 98.1% 57.6 m 65.0 s
INFO 08:04:54,107 VariantDataManager - QD: mean = NaN standard deviation = NaN
INFO 08:04:57,461 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR A USER ERROR has occurred (version 3.3-0-g37228af):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Bad input: Values for QD annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://gatkforums.broadinstitute.org/discussion/49/using-variant-annotator
ERROR ------------------------------------------------------------------------------------------

I think I am using the right arguments, but am not sure. It is as per the webpage (I think!). Am I doing something wrong?

thanks!

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    This can be due to 2 things: not enough data (eg running on too few samples) or not having the annotations present in the vcf file (check if QD is annotated).

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @bsmith030465 said:
    -input SL545_raw.snps.indels.g.vcf

    Your file name could indicate that you are using a gVCF as input. I don't think you are meant to use gVCFs as input for VR. If your file is not a gVCF, then please ignore my comment. Best of luck getting it to work.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Good eye, @tommycarstensen. QD isn't annotated in gVCFs iirc.

  • bsmith030465bsmith030465 caMember

    Hi,

    Good catch! So, I used GenotypeGVCFs to make a vcf file, but I get a similar error when I run my vcf file through the VariantRecalibrator. I have about 9 WGS samples. The command and error are:

    command:

    java -Xmx4g -jar ~/GATK-3.3-0/GenomeAnalysisTK.jar -T VariantRecalibrator -R ~/GATK-3.3-0/bundle/2.8/hg19/ucsc.hg19.fasta -input out_9_samples.vcf -recalFile ./output.recal -tranchesFile ./output.tranches -nt 8 -resource:hapmap,known=false,training=true,truth=true,prior=15.0 ~/GATK-3.3-0/bundle/2.8/hg19/hapmap_3.3.hg19.sites.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 ~/GATK-3.3-0/bundle/2.8/hg19/1000G_omni2.5.hg19.sites.vcf -resource:1000G,known=false,training=true,truth=false,prior=10.0 ~/GATK-3.3-0/bundle/2.8/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ~/GATK-3.3-0/bundle/2.8/hg19/dbsnp_138.hg19.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP -an InbreedingCoeff -mode SNP

    Error:

    INFO 08:24:35,255 VariantDataManager - FS: mean = 3.43 standard deviation = 4.72
    INFO 08:24:37,547 ProgressMeter - chrUn_gl000249:9892 6.952864E7 14.5 m 12.0 s 100.0% 14.5 m 0.0 s
    INFO 08:24:38,555 VariantDataManager - SOR: mean = 0.78 standard deviation = 0.39
    INFO 08:24:41,866 VariantDataManager - DP: mean = 229.54 standard deviation = 33.01
    INFO 08:24:45,068 VariantDataManager - InbreedingCoeff: mean = NaN standard deviation = NaN
    INFO 08:24:56,192 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.3-0-g37228af):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: Bad input: Values for InbreedingCoeff annotation not detected for ANY training variant in the input callset. VariantAnnotator may be used to add these annotations. See http://gatkforums.broadinstitute.org/discussion/49/using-variant-annotator

    ==========

    I am now looking at a way to use VariantAnnotator as suggested in the error message (http://gatkforums.broadinstitute.org/discussion/49/using-variant-annotator):

    java -Xmx2g -jar GenomeAnalysisTK.jar \
    -R ref.fasta \
    -T VariantAnnotator \
    -I input.bam \
    -o output.vcf \
    -A Coverage \
    --variant input.vcf \
    -L input.vcf \
    --dbsnp dbsnp.vcf

    So, for each of the 9 samples, I need to specify the recalibrated reads bam file (-I above) and the sampleID.g.vcf as input (--variant and -L above), and I'll get my annotated output.vcf file. I'll then redo GenotypeGVCFs, and then redo the VariantRecalibrator.

    Does this sound right?

    thanks!

Sign In or Register to comment.