Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VariantRecalibrator Engine failure

ekofmanekofman Member, Broadie

Hi,

I'm trying to run the following on WGS samples:

/gatk/build/install/gatk/bin/gatk VariantRecalibrator -R /cromwell_root/fc-f36b3dc8-85f7-4d7f-bc99-a4610229d66a/broadinstitute/reference/hg19/fasta/Homo_sapiens_assembly19.fasta -V /cromwell_root/fc-123f0687-7fec-4d69-a5a3-9b4ac93beab0/7e1d530b-33ec-46a2-9967-441071e5d714/jointgenotype_genotype/ffe4d42d-6465-45de-b0bc-01fbd5f4622a/call-ApplyRecalSNP/normalsampleset.recal.snp.g.vcf.gz --resource mills,known=false,training=true,truth=true,prior=12.0:recalibrate/Mills_and_1000G_gold_standard.indels.b37.vcf --resource dbsnp,known=true,training=false,truth=false,prior=2.0:recalibrate/dbsnp_138.b37.vcf --use-annotation QD --use-annotation MQRankSum --use-annotation ReadPosRankSum --use-annotation FS --mode INDEL -tranche 100.0 -tranche 99.9 -tranche 99.8 -tranche 99.7 -tranche 99.6 -tranche 99.5 -tranche 99.4 -tranche 99.3 -tranche 99.2 -tranche 99.1 -tranche 99.0 -tranche 98.9 -tranche 98.8 -tranche 98.6 -tranche 98.5 -tranche 98.3 -tranche 98.2 -tranche 98.1 -tranche 98.0 -tranche 97.9 -tranche 97.8 -tranche 97.5 -tranche 97.0 -tranche 95.0 -tranche 90.0 --output normalsampleset.INDEL.recal --tranches-file normalsampleset.INDEL.tranches --rscript-file normalsampleset.INDEL.R

It is failing halfway with a cryptic error:

15:39:10.737 INFO VariantRecalibrator - Done initializing engine 15:39:10.767 INFO TrainingSet - Found mills track: Known = false Training = true Truth = true Prior = Q12.0 15:39:10.768 INFO TrainingSet - Found dbsnp track: Known = true Training = false Truth = false Prior = Q2.0 15:39:10.775 WARN GATKVariantContextUtils - Can't determine output variant file format from output file extension "recal". Defaulting to VCF. 15:39:10.795 INFO ProgressMeter - Starting traversal 15:39:10.795 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute 15:39:20.827 INFO ProgressMeter - 1:219441985 0.2 191000 1142344.5 15:39:30.854 INFO ProgressMeter - 2:202422268 0.3 439000 1313126.3 15:39:40.880 INFO ProgressMeter - 3:44195999 0.5 687000 1370118.0 15:39:50.923 INFO ProgressMeter - 4:29584552 0.7 928000 1387559.8 15:40:00.940 INFO ProgressMeter - 5:20409595 0.8 1200000 1435836.1 15:40:10.978 INFO ProgressMeter - 6:21835741 1.0 1437000 1432654.3 15:40:20.994 INFO ProgressMeter - 7:23090705 1.2 1706000 1458161.2 15:40:31.056 INFO ProgressMeter - 8:39816783 1.3 1964000 1468210.0 15:40:41.078 INFO ProgressMeter - 9:28887875 1.5 2206000 1466073.0 15:40:51.109 INFO ProgressMeter - 10:32387304 1.7 2480000 1483342.3 15:41:01.113 INFO ProgressMeter - 11:39779480 1.8 2767000 1504922.1 15:41:11.211 INFO ProgressMeter - 12:44466895 2.0 3038000 1513764.9 15:41:21.242 INFO ProgressMeter - 14:21930361 2.2 3312000 1523377.3 15:41:31.269 INFO ProgressMeter - 16:18076709 2.3 3594000 1535088.3 15:41:41.269 INFO ProgressMeter - 17:22089136 2.5 3871000 1543522.5 15:41:51.281 INFO ProgressMeter - 18:27782257 2.7 4104000 1534339.4 15:42:01.290 INFO ProgressMeter - 19:36452733 2.8 4368000 1537180.2 15:42:11.307 INFO ProgressMeter - 20:61623489 3.0 4622000 1536296.8 15:42:21.318 INFO ProgressMeter - X:14415975 3.2 4917000 1548474.5 15:42:25.221 INFO ProgressMeter - GL000192.1:348384 3.2 5103668 1574995.5 15:42:25.221 INFO ProgressMeter - Traversal complete. Processed 5103668 total variants in 3.2 minutes. 15:42:25.291 INFO VariantDataManager - QD: mean = 20.37 standard deviation = 8.35 15:42:25.374 INFO VariantDataManager - MQRankSum: mean = 0.18 standard deviation = 0.65 15:42:25.456 INFO VariantDataManager - ReadPosRankSum: mean = 0.26 standard deviation = 0.96 15:42:25.524 INFO VariantDataManager - FS: mean = 3.15 standard deviation = 5.59 15:42:25.968 INFO VariantDataManager - Annotations are now ordered by their information content: [QD, FS, MQRankSum, ReadPosRankSum] 15:42:26.000 INFO VariantDataManager - Training with 332973 variants after standard deviation thresholding. 15:42:26.007 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations... 15:42:47.908 INFO VariantRecalibratorEngine - Finished iteration 0. 15:42:54.626 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 1.45712 15:43:00.822 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.09152 15:43:07.009 INFO VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.09895 15:43:13.567 INFO VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.08980 15:43:19.994 INFO VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.08297 15:43:26.272 INFO VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.11782 15:43:32.744 INFO VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.10006 15:43:39.222 INFO VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.15533 15:43:46.351 INFO VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.01678 15:43:53.432 INFO VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.00999 15:44:00.404 INFO VariantRecalibratorEngine - Finished iteration 55. Current change in mixture coefficients = 0.00607 15:44:07.300 INFO VariantRecalibratorEngine - Finished iteration 60. Current change in mixture coefficients = 0.00359 15:44:14.207 INFO VariantRecalibratorEngine - Finished iteration 65. Current change in mixture coefficients = 0.00190 15:44:14.207 INFO VariantRecalibratorEngine - Convergence after 65 iterations! 15:44:14.830 INFO VariantRecalibratorEngine - Evaluating full set of 946745 variants... 15:44:14.830 WARN VariantRecalibratorEngine - Evaluate datum returned a NaN. 15:44:14.862 INFO VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000. 15:44:14.897 INFO VariantRecalibrator - Shutting down engine [March 21, 2018 3:44:14 PM UTC] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 5.11 minutes. Runtime.totalMemory()=2712666112 java.lang.IllegalArgumentException: No data found. at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34) at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:629) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:895) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:277)

I'm not sure what the 'no data found' error refers to. Does it mean filtering is too strict and leaving no variants to output or something of that sort?

Thanks for the help.

Best Answers

  • ekofmanekofman
    edited March 2018 Accepted Answer

    Hi @shlee, it was GATK4.0; however, this problem was actually fixed just now by setting the parameter --max-gaussians=4 based on some other forum postings. Can you elucidate why this might have fixed it? We are guessing it has something to do with the number of variants?

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @ekofman,

    You are getting the error:

    No data found. at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34) at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:629) at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:895) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:153) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195) at org.broadinstitute.hellbender.Main.main(Main.java:277)
    

    Can you tell us which version of GATK you are running VariantRecalibrator from?

    Given --mode INDEL, is it possible there are no indels in your VCF? Can you confirm this in normalsampleset.recal.snp.g.vcf.gz? The No data found. error happens when the VariantRecalibratorEngine encounters an empty dataset.

  • ekofmanekofman Member, Broadie
    edited March 2018 Accepted Answer

    Hi @shlee, it was GATK4.0; however, this problem was actually fixed just now by setting the parameter --max-gaussians=4 based on some other forum postings. Can you elucidate why this might have fixed it? We are guessing it has something to do with the number of variants?

Sign In or Register to comment.