Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator indel error: No data found

robcrobc irelandMember
I'm running VariantRecalibrator on a whole genome VCF which works for SNP mode but not indel mode. The log files for the SNP and indel workflows seem identical up to the TrancheManager point where the SNPS workflow continues to completion (produces recal, tranches, & plot files) but the indel workflow prints a warning "Evaluate datum returned a NaN" before TrancheManager and instead prints "java.lang.IllegalArgumentException: No data found" followed by empty or incomplete output file consisting of just headers.


# command for indel mode
gatk-4.1.0.0/gatk VariantRecalibrator -V VCF/sample.vcf -R hs37d5.fa \
--resource:mills,known=false,training=true,truth=true,prior=12.0 GATK4_training_resources/GRCh37/indels/Mills_and_1000G_gold_standard.indels.b37.vcf \
--resource:dbsnp,known=true,training=false,truth=false,prior=2.0 GATK4_training_resources/GRCh37/SNPs/dbsnp_138.b37.vcf \
-an DP -an FS -an QD -an SOR -an MQRankSum -an ReadPosRankSum \
-mode INDEL \
--output $VQSR/sample_indel.recal --tranches-file $VQSR/sample_indel.tranches --rscript-file $VQSR/sample_indel.plots.R

# log file for indel mode
Using GATK jar gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar VariantRecalibrator -V VCF/LH0067A_WGS.vcf -R hs37d5.fa --resource:mills,known=false,training=true,truth=true,prior=12.0 GATK4_training_resources/GRCh37/indels/Mills_and_1000G_gold_standard.indels.b37.vcf --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 GATK4_training_resources/GRCh37/SNPs/dbsnp_138.b37.vcf -an DP -an FS -an QD -an SOR -an MQRankSum -an ReadPosRankSum -mode INDEL --output VCF/VQSR/LH0067A_WGS_indel.recal --tranches-file VCF/VQSR/LH0067A_WGS_indel.tranches --rscript-file VCF/VQSR/LH0067A_WGS_indel.plots.R
14:13:48.795 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk-4.1.0.0/gatk-package-4.1.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
14:13:50.550 INFO VariantRecalibrator - ------------------------------------------------------------
14:13:50.551 INFO VariantRecalibrator - The Genome Analysis Toolkit (GATK) v4.1.0.0
14:13:50.551 INFO VariantRecalibrator - For support and documentation go to
14:13:50.551 INFO VariantRecalibrator - Executing on Linux v4.13.0-21-generic amd64
14:13:50.551 INFO VariantRecalibrator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_171-b11
14:13:50.552 INFO VariantRecalibrator - Start Date/Time: 26 August 2019 14:13:48 IST
14:13:50.552 INFO VariantRecalibrator - ------------------------------------------------------------
14:13:50.552 INFO VariantRecalibrator - ------------------------------------------------------------
14:13:50.553 INFO VariantRecalibrator - HTSJDK Version: 2.18.2
14:13:50.553 INFO VariantRecalibrator - Picard Version: 2.18.25
14:13:50.553 INFO VariantRecalibrator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
14:13:50.553 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
14:13:50.553 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
14:13:50.553 INFO VariantRecalibrator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
14:13:50.554 INFO VariantRecalibrator - Deflater: IntelDeflater
14:13:50.554 INFO VariantRecalibrator - Inflater: IntelInflater
14:13:50.554 INFO VariantRecalibrator - GCS max retries/reopens: 20
14:13:50.554 INFO VariantRecalibrator - Requester pays: disabled
14:13:50.554 INFO VariantRecalibrator - Initializing engine
14:13:50.899 INFO FeatureManager - Using codec VCFCodec to read file file:///GATK4_training_resources/GRCh37/indels/Mills_and_1000G_gold_standard.indels.b37.vcf
14:13:50.949 INFO FeatureManager - Using codec VCFCodec to read file file:///GATK4_training_resources/GRCh37/SNPs/dbsnp_138.b37.vcf
14:13:51.011 INFO FeatureManager - Using codec VCFCodec to read file file:///VCF/LH0067A_WGS.vcf
14:13:51.133 INFO VariantRecalibrator - Done initializing engine
14:13:51.135 INFO TrainingSet - Found mills track: Known = false Training = true Truth = true Prior = Q12.0
14:13:51.135 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0
14:13:51.140 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF.
14:13:51.157 INFO ProgressMeter - Starting traversal
14:13:51.157 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
14:14:01.167 INFO ProgressMeter - 1:102775044 0.2 163000 977120.6
14:14:11.179 INFO ProgressMeter - 1:240097018 0.3 358000 1072819.9
14:14:21.219 INFO ProgressMeter - 2:114141140 0.5 552000 1101759.8
14:14:31.259 INFO ProgressMeter - 2:239865195 0.7 751000 1123634.7
14:14:41.283 INFO ProgressMeter - 3:118470943 0.8 950000 1137134.4
14:14:51.288 INFO ProgressMeter - 4:42283057 1.0 1152000 1149509.4
14:15:01.330 INFO ProgressMeter - 4:173911496 1.2 1371000 1172262.4
14:15:11.351 INFO ProgressMeter - 5:113585432 1.3 1588000 1188133.6
14:15:21.367 INFO ProgressMeter - 6:51847687 1.5 1801000 1197871.6
14:15:31.423 INFO ProgressMeter - 7:6856323 1.7 2003000 1198611.7
14:15:41.426 INFO ProgressMeter - 7:135942486 1.8 2211000 1203058.0
14:15:51.443 INFO ProgressMeter - 8:95019406 2.0 2419000 1206624.2
14:16:01.475 INFO ProgressMeter - 9:96638194 2.2 2635000 1213186.2
14:16:11.484 INFO ProgressMeter - 10:82722379 2.3 2847000 1217299.6
14:16:21.487 INFO ProgressMeter - 11:70349061 2.5 3058000 1220514.9
14:16:31.492 INFO ProgressMeter - 12:57255908 2.7 3270000 1223687.9
14:16:41.532 INFO ProgressMeter - 13:64582274 2.8 3474000 1223418.9
14:16:51.552 INFO ProgressMeter - 14:94700590 3.0 3692000 1227972.0
14:17:01.578 INFO ProgressMeter - 16:18329817 3.2 3902000 1229486.2
14:17:11.629 INFO ProgressMeter - 17:59683778 3.3 4103000 1228008.0
14:17:21.632 INFO ProgressMeter - 19:20846817 3.5 4311000 1228934.6
14:17:31.651 INFO ProgressMeter - 21:24518253 3.7 4510000 1227244.3
14:17:41.660 INFO ProgressMeter - X:114219365 3.8 4700000 1223411.4
14:17:45.284 INFO ProgressMeter - hs37d5:35441763 3.9 5013459 1284805.0
14:17:45.284 INFO ProgressMeter - Traversal complete. Processed 5013459 total variants in 3.9 minutes.
14:17:45.384 INFO VariantDataManager - DP: mean = 40.42 standard deviation = 12.46
14:17:45.533 INFO VariantDataManager - FS: mean = 1.72 standard deviation = 3.42
14:17:45.620 INFO VariantDataManager - QD: mean = 22.14 standard deviation = 8.22
14:17:45.728 INFO VariantDataManager - SOR: mean = 1.00 standard deviation = 0.57
14:17:45.845 INFO VariantDataManager - MQRankSum: mean = -0.06 standard deviation = 0.53
14:17:45.960 INFO VariantDataManager - ReadPosRankSum: mean = -0.02 standard deviation = 1.04
14:17:46.531 INFO VariantDataManager - Annotation order is: [DP, QD, FS, SOR, MQRankSum, ReadPosRankSum]
14:17:46.578 INFO VariantDataManager - Training with 386380 variants after standard deviation thresholding.
14:17:46.584 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
14:18:10.635 INFO VariantRecalibratorEngine - Finished iteration 0.
14:18:20.561 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 1.35432
14:18:31.042 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.98573
14:18:40.585 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.04389
14:18:49.844 INFO VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.02391
14:18:59.044 INFO VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.01427
14:19:08.087 INFO VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.00883
14:19:17.277 INFO VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.00562
14:19:26.755 INFO VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.00373
14:19:36.042 INFO VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.00259
14:19:43.146 INFO VariantRecalibratorEngine - Convergence after 49 iterations!
14:19:43.996 INFO VariantRecalibratorEngine - Evaluating full set of 948869 variants...
14:19:43.996 WARN VariantRecalibratorEngine - Evaluate datum returned a NaN.
14:19:44.058 INFO VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
14:19:44.061 INFO VariantRecalibrator - Shutting down engine
[26 August 2019 14:19:44 IST] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 5.92 minutes.
Runtime.totalMemory()=11151081472
java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:656)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:968)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)


Is there anything in the command / log file that could explain why VariantRecalibrator works on SNPs but not indels of the same VCF? I'm guessing part of the command is incorrect based on the IllegalArgumentException but the GATK doesn't specify which part of the syntax is incorrect.

Thanks!

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited September 4

    Hi @robc

    This is common error and we have similar issues posted by other users on this forum. Have you had a chance to look at some of their suggested solutions and try it ?

Sign In or Register to comment.