error with VariantRecalibrator java.lang.IllegalArgumentException: No data found.

Dear GATK Team,

I have one whole genome data called with the HaplotypeCaller. I would like to apply the VariantRecalibrator to recalibrate my variant set, but I get back an error as follows:

INFO 22:05:17,683 ProgressMeter - chrY:59361069 6.7818693E7 68.2 m 60.0 s 98.7% 69.1 m 54.0 s
INFO 22:05:21,996 VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.00201
INFO 22:05:47,684 ProgressMeter - chrY:59361069 6.7818693E7 68.7 m 60.0 s 98.7% 69.6 m 55.0 s
INFO 22:06:01,899 VariantRecalibratorEngine - Convergence after 51 iterations!
INFO 22:06:18,103 ProgressMeter - chrY:59361069 6.7818693E7 69.2 m 61.0 s 98.7% 70.1 m 55.0 s
INFO 22:06:24,494 VariantRecalibratorEngine - Evaluating full set of 3869624 variants...
INFO 22:06:24,658 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.

ERROR --
ERROR stack trace

java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:489)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:185)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: No data found.
ERROR ----------------------------------------------------

The commands I used were as follows:
java -Xmx5G -jar GenomeAnalysisTK3.7.jar -T SelectVariants -R hg19.fasta -V NA12878_1.vcf.gz -selectType SNP --excludeNonVariants -o NA12878_1.raw.snp.vcf.gz && \
java -Xmx5G -jarGenomeAnalysisTK3.7.jar -T VariantRecalibrator -R hg19.fasta -input NA12878_1.raw.snp.vcf.gz \
-resource:hapmap,known=false,training=true,truth=true,prior=15.0 ./gatk/hapmap_3.3.hg19.vcf \
-resource:omni,known=false,training=true,truth=true,prior=12.0 ./gatk/1000G_omni2.5.hg19.vcf \
-resource:1000G,known=false,training=true,truth=false,prior=10.0 ./gatk/1000G_phase1.snps.high_confidence.hg19.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=2.0 ./gatk/dbsnp_138.hg19.vcf \
-an DP -an QD -an FS -an SOR -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \
-recalFile NA12878_1.recalibrate_SNP.recal -tranchesFile NA12878_1.recalibrate_SNP.tranches -rscriptFile NA12878_1.recalibrate_SNP_plots.R

What do I do next?

PS, I get correct results from the other two WGS data using the same command.

Thank you for your help in advance,
Kind regards,

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Wangxb
    Hi,

    1 whole genome is technically supposed to have enough data for VQSR, but it is better to have more data. Is it possible to combine the 3 VCFs into one and run VQSR?

    Also, you may find this article interesing :smile:

    -Sheila

  • lizhichaolizhichao Member

    Hi:
    I run vqsr(GATK3.7) in WGS pipeline (~50X Fastq) to filter snp sites,which failed with "java.lang.IllegalArgumentException: No data found."
    My command:
    java -Xmx10G -Djava.io.tmpdir=./java_tmp -jar /opt/bin/GenomeAnalysisTK.jar -T SelectVariants -R /l3bioinfo/test-data/ucsc.hg19.fasta -V /l3bioinfo/test-data/15001710502512A.vcf.gz -selectType SNP --excludeNonVariants -o 15001710502512A.raw.snp.vcf.gz && \
    java -Xmx10G -Djava.io.tmpdir=./java_tmp -jar /opt/bin/GenomeAnalysisTK.jar -T VariantRecalibrator -R /l3bioinfo/test-data/ucsc.hg19.fasta -input 15001710502512A.raw.snp.vcf.gz \
    -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /l3bioinfo/test-data/hapmap_3.3.hg19.vcf \
    -resource:omni,known=false,training=true,truth=true,prior=12.0 /l3bioinfo/test-data/1000G_omni2.5.hg19.vcf \
    -resource:1000G,known=false,training=true,truth=false,prior=10.0 /l3bioinfo/test-data/1000G_phase1.snps.high_confidence.hg19.vcf \
    -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 /l3bioinfo/test-data/dbsnp_138.hg19.vcf \
    -an DP -an QD -an FS -an SOR -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 --maxGaussians 8 \
    -recalFile 15001710502512A.recalibrate_snp.recal -tranchesFile 15001710502512A.recalibrate_snp.tranches -rscriptFile 15001710502512A.recalibrate_snp_plots.R

    My question is
    1)why ? I think varinats sufficient to build model by VariantRecalibrator
    2) I did rerun this step only,it failed again.But when i rerun the WGS pipeline(bwa-markdup-haplotype-vqsr),it worked successfully. why???

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @lizhichao
    Hi,

    Did you do anything differently in the re-run? How many variants were in the original VCF compared to the "re-run" VCF?

    -Sheila

Sign In or Register to comment.