To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

VQSR error

NeethuNeethu IndiaMember
edited October 2017 in Ask the GATK team

I used to run VQSR using the following command. Approximately for 400 samples it worked very well. But for the first time I am getting an error while doing VQSR by adding few more sample with old ones.

[root@localhost Process]# java -Xmx8g -XX:ParallelGCThreads=20 -jar /mnt/exome/Softwares/GenomeAnalysisTK.jar -T VariantRecalibrator -R /mnt/exome/ReferenceFiles/human_g1k_v37.fasta -input Combined.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /mnt/exome/Softwares/HG19/hapmap_3.3.b37.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /mnt/exome/Softwares/HG19/1000G_omni2.5.b37.vcf -resource:dbsnp,known=true,training=true,truth=false,prior=10.0 /mnt/exome/Softwares/HG19/dbsnp_hg19_138.vcf -an DP -an QD -an FS -an MQRankSum -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R
INFO 15:22:17,107 HelpFormatter - ---------------------------------------------------------------------------------
INFO 15:22:17,108 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
INFO 15:22:17,109 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 15:22:17,109 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
INFO 15:22:17,109 HelpFormatter - [Mon Oct 23 15:22:17 IST 2017] Executing on Linux 3.10.0-514.6.1.el7.x86_64 amd64
INFO 15:22:17,109 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_121-b13 JdkDeflater
INFO 15:22:17,112 HelpFormatter - Program Args: -T VariantRecalibrator -R /mnt/exome/ReferenceFiles/human_g1k_v37.fasta -input Combined.vcf -resource:hapmap,known=false,training=true,truth=true,prior=15.0 /mnt/exome/Softwares/HG19/hapmap_3.3.b37.vcf -resource:omni,known=false,training=true,truth=true,prior=12.0 /mnt/exome/Softwares/HG19/1000G_omni2.5.b37.vcf -resource:dbsnp,known=true,training=true,truth=false,prior=10.0 /mnt/exome/Softwares/HG19/dbsnp_hg19_138.vcf -an DP -an QD -an FS -an MQRankSum -an ReadPosRankSum -mode SNP -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 -recalFile recalibrate_SNP.recal -tranchesFile recalibrate_SNP.tranches -rscriptFile recalibrate_SNP_plots.R
INFO 15:22:17,117 HelpFormatter - Executing as root@localhost.localdomain on Linux 3.10.0-514.6.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_121-b13.
INFO 15:22:17,117 HelpFormatter - Date/Time: 2017/10/23 15:22:17
INFO 15:22:17,117 HelpFormatter - ---------------------------------------------------------------------------------
INFO 15:22:17,118 HelpFormatter - ---------------------------------------------------------------------------------
INFO 15:22:17,135 GenomeAnalysisEngine - Strictness is SILENT
INFO 15:22:17,218 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO 15:22:17,546 GenomeAnalysisEngine - Preparing for traversal
INFO 15:22:17,551 GenomeAnalysisEngine - Done preparing for traversal
INFO 15:22:17,551 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 15:22:17,552 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 15:22:17,552 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
INFO 15:22:17,557 TrainingSet - Found hapmap track: Known = false Training = true Truth = true Prior = Q15.0
INFO 15:22:17,558 TrainingSet - Found omni track: Known = false Training = true Truth = true Prior = Q12.0
INFO 15:22:17,558 TrainingSet - Found dbsnp track: Known = true Training = true Truth = false Prior = Q10.0
INFO 15:22:47,554 ProgressMeter - 2:55755921 6684477.0 30.0 s 4.0 s 9.8% 5.1 m 4.6 m
INFO 15:23:17,556 ProgressMeter - 3:113288945 1.3557799E7 60.0 s 4.0 s 19.5% 5.1 m 4.1 m
INFO 15:23:47,557 ProgressMeter - 5:16225713 2.0535053E7 90.0 s 4.0 s 28.9% 5.2 m 3.7 m
INFO 15:24:17,558 ProgressMeter - 6:136574731 2.7525682E7 120.0 s 4.0 s 38.7% 5.2 m 3.2 m
INFO 15:24:47,559 ProgressMeter - 8:95110754 3.4599575E7 2.5 m 4.0 s 48.0% 5.2 m 2.7 m
INFO 15:25:17,560 ProgressMeter - 10:116951732 4.1411494E7 3.0 m 4.0 s 57.9% 5.2 m 2.2 m
INFO 15:25:47,561 ProgressMeter - 12:126998698 4.8109015E7 3.5 m 4.0 s 67.0% 5.2 m 103.0 s
INFO 15:26:17,562 ProgressMeter - 16:3932287 5.4821274E7 4.0 m 4.0 s 77.8% 5.1 m 68.0 s
INFO 15:26:47,563 ProgressMeter - 19:37901784 6.1660969E7 4.5 m 4.0 s 87.0% 5.2 m 40.0 s
INFO 15:27:17,323 VariantDataManager - DP: mean = 23089.37 standard deviation = 15289.49
INFO 15:27:17,395 VariantDataManager - QD: mean = 11.97 standard deviation = 5.15
INFO 15:27:17,431 VariantDataManager - FS: mean = 2.00 standard deviation = 9.01
INFO 15:27:17,462 VariantDataManager - MQRankSum: mean = -0.21 standard deviation = 1.39
INFO 15:27:17,491 VariantDataManager - ReadPosRankSum: mean = 0.42 standard deviation = 1.03
INFO 15:27:17,564 ProgressMeter - GL000202.1:10465 6.8459572E7 5.0 m 4.0 s 99.8% 5.0 m 0.0 s
INFO 15:27:17,702 VariantDataManager - Annotations are now ordered by their information content: [DP, QD, FS, ReadPosRankSum, MQRankSum]
INFO 15:27:17,724 VariantDataManager - Training with 195822 variants after standard deviation thresholding.
INFO 15:27:17,727 GaussianMixtureModel - Initializing model with 100 k-means iterations...
INFO 15:27:27,669 VariantRecalibratorEngine - Finished iteration 0.
INFO 15:27:32,206 VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 1.13417
INFO 15:27:36,627 VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.59423
INFO 15:27:41,171 VariantRecalibratorEngine - Finished iteration 15. Current change in mixture coefficients = 0.20486
INFO 15:27:46,168 VariantRecalibratorEngine - Finished iteration 20. Current change in mixture coefficients = 0.03726
INFO 15:27:47,565 ProgressMeter - GL000202.1:10465 6.8459572E7 5.5 m 4.0 s 99.8% 5.5 m 0.0 s
INFO 15:27:51,022 VariantRecalibratorEngine - Finished iteration 25. Current change in mixture coefficients = 0.03971
INFO 15:27:55,971 VariantRecalibratorEngine - Finished iteration 30. Current change in mixture coefficients = 0.04863
INFO 15:28:00,943 VariantRecalibratorEngine - Finished iteration 35. Current change in mixture coefficients = 0.03344
INFO 15:28:05,845 VariantRecalibratorEngine - Finished iteration 40. Current change in mixture coefficients = 0.03454
INFO 15:28:10,855 VariantRecalibratorEngine - Finished iteration 45. Current change in mixture coefficients = 0.04853
INFO 15:28:15,849 VariantRecalibratorEngine - Finished iteration 50. Current change in mixture coefficients = 0.10959
INFO 15:28:17,566 ProgressMeter - GL000202.1:10465 6.8459572E7 6.0 m 5.0 s 99.8% 6.0 m 0.0 s
INFO 15:28:20,957 VariantRecalibratorEngine - Finished iteration 55. Current change in mixture coefficients = 0.00585
INFO 15:28:26,167 VariantRecalibratorEngine - Finished iteration 60. Current change in mixture coefficients = 0.00347
INFO 15:28:31,206 VariantRecalibratorEngine - Finished iteration 65. Current change in mixture coefficients = 0.00198
INFO 15:28:31,206 VariantRecalibratorEngine - Convergence after 65 iterations!
INFO 15:28:31,895 VariantRecalibratorEngine - Evaluating full set of 361653 variants...
INFO 15:28:31,917 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000.

ERROR --
ERROR stack trace

java.lang.IllegalArgumentException: No data found.
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:88)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:489)
at org.broadinstitute.gatk.tools.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:185)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: No data found.
ERROR ------------------------------------------------------------------------------------------
Tagged:

Issue · Github
by Sheila

Issue Number
2611
State
open
Last Updated
Assignee
Array
Milestone
Array

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Hi @Neethu,

    The VQSR algorithm has some quirks that occasionally lead to odd behavior depending on the data that is included in the analysis. It's possible that the samples you added somehow destabilized the model, especially if they were generated in a way that is not exactly the same as the others. But as long as the updated cohort was joint-called together, your results should be ok if the model worked using 4 max Gaussians. This parameter reduces the complexity of the model, so it tends to be more stable, at the expense of some resolution.

Answers

  • NeethuNeethu IndiaMember

    Hello,
    It would be really great if some one can give me a reply soon.. Thank you

  • NeethuNeethu IndiaMember
    edited October 2017

    It worked with --maxGaussians 4

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Neethu
    Hi,

    Glad you found a workaround. We usually recommend using that threshold when using fewer than 30 exome samples, but it seems it works for your case as well. I will check with the team if that is okay. Can you also try the latest version without --maxGaussians 4 ?

    Thanks,
    Sheila

  • NeethuNeethu IndiaMember

    Hi,
    I will try with latest version and let you know and please let me know that usage of --maxGaussians 4 is ok in my case.

    Thank you

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Accepted Answer

    Hi @Neethu,

    The VQSR algorithm has some quirks that occasionally lead to odd behavior depending on the data that is included in the analysis. It's possible that the samples you added somehow destabilized the model, especially if they were generated in a way that is not exactly the same as the others. But as long as the updated cohort was joint-called together, your results should be ok if the model worked using 4 max Gaussians. This parameter reduces the complexity of the model, so it tends to be more stable, at the expense of some resolution.

Sign In or Register to comment.