We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

VariantRecalibrator show error: java.lang.IllegalArgumentException: No data found.

Tong13Tong13 Member
edited April 2019 in Ask the GATK team
Hi,
I use the VariantRecalibrator with GATK-4.0.10.1, and my sample is WGS data. I use CombineGVCFs with trio. it make an error:
```
18:19:38.004 WARN VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants.
18:19:38.150 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
18:23:57.006 INFO VariantRecalibratorEngine - Finished iteration 0.
18:25:35.317 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 2.18508
18:27:22.226 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 2.44691
18:29:11.724 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.45284
18:31:01.478 INFO VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.14604
18:32:51.452 INFO VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.07146
18:34:40.870 INFO VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.04044
18:36:30.225 INFO VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.03140
18:38:20.063 INFO VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.03935
18:40:07.539 INFO VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.02160
18:41:56.840 INFO VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.00508
18:43:43.879 INFO VariantRecalibratorEngine - Finished iteration 55. Current change in mixture coefficients = 0.00669
18:45:33.833 INFO VariantRecalibratorEngine - Finished iteration 60. Current change in mixture coefficients = 0.02022
18:47:22.956 INFO VariantRecalibratorEngine - Finished iteration 65. Current change in mixture coefficients = 0.01239
18:49:12.972 INFO VariantRecalibratorEngine - Finished iteration 70. Current change in mixture coefficients = 0.00722
18:51:02.744 INFO VariantRecalibratorEngine - Finished iteration 75. Current change in mixture coefficients = 0.00462
18:52:52.044 INFO VariantRecalibratorEngine - Finished iteration 80. Current change in mixture coefficients = 0.00322
18:54:40.463 INFO VariantRecalibratorEngine - Finished iteration 85. Current change in mixture coefficients = 0.00239
18:56:07.809 INFO VariantRecalibratorEngine - Convergence after 89 iterations!
18:56:23.014 INFO VariantRecalibratorEngine - Evaluating full set of 5415683 variants...
18:56:23.014 WARN VariantRecalibratorEngine - Evaluate datum returned a NaN.
18:56:23.244 INFO VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
18:56:23.247 INFO VariantRecalibrator - Shutting down engine
[April 3, 2019 6:56:23 PM UTC] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 43.62 minutes.
Runtime.totalMemory()=13087801344
java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:656)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:968)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
```
and my command line is :

gatk VariantRecalibrator -R /my/path/of/reference/ucsc.hg19.fasta -V /my/path/of/familymerge/vcf/W007.HC.vcf.gz
--resource hapmap,known=false,training=true,truth=true,
prior=15.0:$GATK_bundle/hg19/hapmap_3.3.hg19.sites.vcf
--resource omini,known=false,training=true,truth=false,prior=12.0:$GATK_bundle/hg19/1000G_omni2.5.hg19.sites.vcf
--resource 1000G,known=false,training=true,truth=false,prior=10.0:$GATK_bundle/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf
--resource dbsnp,known=true,training=false,truth=false,prior=6.0:$GATK_bundle/hg19/dbsnp_138.hg19.vcf
-an DP -an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche
99.0 -tranche 95.0 -tranche 90.0
--rscript-file /my/path/of/outdir/W007.HC.snps.plots.R
--tranches-file /my/path/of/outdir/W007.HC.snps.tranches
-O /my/path/of/outdir/W007.HC.snps.recal
Tagged:

Best Answer

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited April 2019

    Hi @Tong13

    1) How many whole genome samples are you using in the VariantRecalibrator step? If you are using one sample then we recommend you use the GATK CNN tools that were specifically designed for single sample variant filtering. VQSR is meant for a bigger cohort of samples.

    2) Does your log have a message like "Training with 0 variants after standard deviation thresholding" -- if it does, it's possible that all of the training data is failing the standard deviation filter controlled by --standard-deviation-threshold.

  • Tong13Tong13 Member
    > @bhanuGandham said:
    > Hi @Tong13
    >
    > 1) How many whole genome samples are you using in the VariantRecalibrator step? If you are using one sample then we recommend you use the GATK CNN tools that were specifically designed for single sample variant filtering. VQSR is meant for a bigger cohort of samples.
    >
    > 2) Does your log have a message like "Training with 0 variants after standard deviation thresholding" -- if it does, it's possible that all of the training data is failing the standard deviation filter controlled by --standard-deviation-threshold.

    Thank you,
    First, I use VariantRecalibrator with at last 3 samples, some familie's samples could finish it and other could not.
    And, I could not find the message like "Training with 0 variants after standard deviation thresholding", it show the message " Training with 4016141 variants after standard deviation thresholding "
Sign In or Register to comment.