Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator show error: java.lang.IllegalArgumentException: No data found.

Tong13Tong13 Member
edited April 28 in Ask the GATK team
Hi,
I use the VariantRecalibrator with GATK-4.0.10.1, and my sample is WGS data. I use CombineGVCFs with trio. it make an error:
```
18:19:38.004 WARN VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants.
18:19:38.150 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
18:23:57.006 INFO VariantRecalibratorEngine - Finished iteration 0.
18:25:35.317 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 2.18508
18:27:22.226 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 2.44691
18:29:11.724 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.45284
18:31:01.478 INFO VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.14604
18:32:51.452 INFO VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.07146
18:34:40.870 INFO VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.04044
18:36:30.225 INFO VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.03140
18:38:20.063 INFO VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.03935
18:40:07.539 INFO VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.02160
18:41:56.840 INFO VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.00508
18:43:43.879 INFO VariantRecalibratorEngine - Finished iteration 55. Current change in mixture coefficients = 0.00669
18:45:33.833 INFO VariantRecalibratorEngine - Finished iteration 60. Current change in mixture coefficients = 0.02022
18:47:22.956 INFO VariantRecalibratorEngine - Finished iteration 65. Current change in mixture coefficients = 0.01239
18:49:12.972 INFO VariantRecalibratorEngine - Finished iteration 70. Current change in mixture coefficients = 0.00722
18:51:02.744 INFO VariantRecalibratorEngine - Finished iteration 75. Current change in mixture coefficients = 0.00462
18:52:52.044 INFO VariantRecalibratorEngine - Finished iteration 80. Current change in mixture coefficients = 0.00322
18:54:40.463 INFO VariantRecalibratorEngine - Finished iteration 85. Current change in mixture coefficients = 0.00239
18:56:07.809 INFO VariantRecalibratorEngine - Convergence after 89 iterations!
18:56:23.014 INFO VariantRecalibratorEngine - Evaluating full set of 5415683 variants...
18:56:23.014 WARN VariantRecalibratorEngine - Evaluate datum returned a NaN.
18:56:23.244 INFO VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
18:56:23.247 INFO VariantRecalibrator - Shutting down engine
[April 3, 2019 6:56:23 PM UTC] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 43.62 minutes.
Runtime.totalMemory()=13087801344
java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:656)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:968)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
```
and my command line is :

gatk VariantRecalibrator -R /my/path/of/reference/ucsc.hg19.fasta -V /my/path/of/familymerge/vcf/W007.HC.vcf.gz
--resource hapmap,known=false,training=true,truth=true,
prior=15.0:$GATK_bundle/hg19/hapmap_3.3.hg19.sites.vcf
--resource omini,known=false,training=true,truth=false,prior=12.0:$GATK_bundle/hg19/1000G_omni2.5.hg19.sites.vcf
--resource 1000G,known=false,training=true,truth=false,prior=10.0:$GATK_bundle/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf
--resource dbsnp,known=true,training=false,truth=false,prior=6.0:$GATK_bundle/hg19/dbsnp_138.hg19.vcf
-an DP -an QD -an FS -an SOR -an ReadPosRankSum -an MQRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche
99.0 -tranche 95.0 -tranche 90.0
--rscript-file /my/path/of/outdir/W007.HC.snps.plots.R
--tranches-file /my/path/of/outdir/W007.HC.snps.tranches
-O /my/path/of/outdir/W007.HC.snps.recal
Tagged:

Best Answer

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited April 29

    Hi @Tong13

    1) How many whole genome samples are you using in the VariantRecalibrator step? If you are using one sample then we recommend you use the GATK CNN tools that were specifically designed for single sample variant filtering. VQSR is meant for a bigger cohort of samples.

    2) Does your log have a message like "Training with 0 variants after standard deviation thresholding" -- if it does, it's possible that all of the training data is failing the standard deviation filter controlled by --standard-deviation-threshold.

  • Tong13Tong13 Member
    > @bhanuGandham said:
    > Hi @Tong13
    >
    > 1) How many whole genome samples are you using in the VariantRecalibrator step? If you are using one sample then we recommend you use the GATK CNN tools that were specifically designed for single sample variant filtering. VQSR is meant for a bigger cohort of samples.
    >
    > 2) Does your log have a message like "Training with 0 variants after standard deviation thresholding" -- if it does, it's possible that all of the training data is failing the standard deviation filter controlled by --standard-deviation-threshold.

    Thank you,
    First, I use VariantRecalibrator with at last 3 samples, some familie's samples could finish it and other could not.
    And, I could not find the message like "Training with 0 variants after standard deviation thresholding", it show the message " Training with 4016141 variants after standard deviation thresholding "
Sign In or Register to comment.