INDEL + VariantRecalibrator "Training with very few variant sites"

lindenblindenb FranceMember ✭✭

VariantRecalibrator seems to fail for my haloplex dataset because, as far as I understand, there is not enough indels in my dataset.

WARN  16:06:52,414 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable. 
INFO  16:06:52,420 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
INFO  16:06:52,482 VariantRecalibratorEngine - Finished iteration 0. 
INFO  16:06:52,506 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.01625 
INFO  16:06:52,514 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.00691 
INFO  16:06:52,522 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.02285 
INFO  16:06:52,529 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.00935 
INFO  16:06:52,534 VariantRecalibratorEngine - Convergence after 24 iterations! 
INFO  16:06:52,539 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000. 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to retrieve result
    at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(HierarchicalMicroScheduler.java:190)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)

Of courses, this breaks my workflow :-)

Would it be possible to generate a 'mock' recalFile that would tell ApplyRecalibration:

"there is no data to recalibrate but write a VCF anyway".

What would the recalFile look like ?

Thanks,

Pierre

Best Answer

Answers

  • Hi,
    Is the above thing is fixed? I am having same warning message?

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @DamandeepKaur

    Please send me the exact command you are using, the error message, the dataset you are running the tool on and the version of gatk you are using.

    Thank you

    Regards
    Bhanu

  • DamandeepKaurDamandeepKaur Member
    edited October 19

    Hi bhanu,

    I am using GATK 4.04

    The dataset is output of haplotype caller follwed by Consolidating VCFs and joint calling...

    I am doing VQSR like this:-
    A) VariantRecalibrator in SNP mode
    B) ApplyVQSR in SNP mode
    C) VariantRecalibrator in indel mode
    D) ApplyVQSR in indel mode

    So now I am writing the command for all 4 steps

    A) ~/bin/gatk-4.0.4.0/./gatk VariantRecalibrator -R ~/Project/try_on_local/reference/hg38.fa -V final_variants.vcf -resource hapmap,known=false,training=true,truth=true,prior=15.0:hapmap_3.3.hg38.vcf -resource omni,known=false,training=true,truth=false,prior=12.0:modified_1000G_omni2.5.hg38.vcf -resource 1000G,known=false,training=true,truth=false,prior=10.0:sorted_1000G_phase1.snps.high_confidence.hg38.vcf -resource dbsnp,known=true,training=false,truth=false,prior=2.0:sorted_dbsnp_146hg38.vcf -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -mode SNP --output 2output.recal --tranches-file 2output.tranches --rscript-file 2output.plots.R

    B)~/bin/gatk-4.0.4.0/./gatk ApplyVQSR -R ~/Project/try_on_local/reference/hg38.fa -V final_variants.vcf -O 2recal.SNPs.vcf --truth-sensitivity-filter-level 99.0 --tranches-file 2output.tranches --recal-file 2output.recal -mode SNP

    C) ~/bin/gatk-4.0.4.0/./gatk VariantRecalibrator -R ~/Project/try_on_local/reference/hg38.fa -V recal.SNPs.vcf --max-gaussians 4 -resource mills,known=false,training=true,truth=true,prior=12:sorted_Mills_1000_hg38.vcf -resource dbsnp,known=true,training=false,truth=false,prior=2:sorted_dbsnp_146hg38.vcf -an QD -an DP -an FS -an SOR -an ReadPosRankSum -an MQRankSum -mode INDEL -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 --output 2output2.recal --tranches-file 2output2.tranches --rscript-file 2output2.plots.R

    So the tranches file produced in this step in empty

    And

    18:20:41.805 WARN VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable.
    18:20:41.807 INFO GaussianMixtureModel - Initializing model with 100 k-means iterations...
    18:20:41.855 INFO VariantRecalibratorEngine - Finished iteration 0.
    18:20:41.875 INFO VariantRecalibratorEngine - Finished iteration 5. Current change in mixture coefficients = 0.72305
    18:20:41.890 INFO VariantRecalibratorEngine - Finished iteration 10. Current change in mixture coefficients = 0.03159
    18:20:41.900 INFO VariantRecalibratorEngine - Convergence after 13 iterations!
    18:20:41.904 INFO VariantRecalibratorEngine - Evaluating full set of 232696 variants...
    18:20:41.904 WARN VariantRecalibratorEngine - Evaluate datum returned a NaN.
    18:20:41.914 INFO VariantDataManager - Selected worst 0 scoring variants --> variants with LOD <= -5.0000.
    18:20:41.917 INFO VariantRecalibrator - Shutting down engine
    [October 18, 2018 6:20:41 PM CEST] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 4.32 minutes.
    Runtime.totalMemory()=1896349696
    java.lang.IllegalArgumentException: No data found.
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:34)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:629)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:894)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

    D) ~/bin/gatk-4.0.4.0/./gatk ApplyVQSR -R ~/Project/try_on_local/reference/hg38.fa -V re2recal.SNPs.vcf -O 2recal.SNPs_indels.vcf --truth-sensitivity-filter-level 99.0 --tranches-file 2output2.tranches --recal-file 2output2.recal -mode INDEL
    Using GATK jar /home/damandeep/bin/gatk-4.0.4.0/gatk-package-4.0.4.0-local.jar


    A USER ERROR has occurred: No tranches were found in the file or were above the truth sensitivity filter level 99.0


    This is the second time I am doing VQSR...first time i did it without supplying interval list to haplotype caller...than it worked fine but my final ti/tv ratio was 1.29. So now I have tried with interval list
    Which is .bed file from UPSC having start and end positions of exons..

    I had 29 exomes out of which my samples are 18 and I used 11 other to run through VQSR.

    Thank you.

    Regards Damandeep kaur.

  • DamandeepKaurDamandeepKaur Member
    edited October 19

    I would like to add one thing my -V vcf have 295088 variants out of which 233349 indels and 61739 SNPs.

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    HI @DamandeepKaur

    To run VQSR you should have at least 30 exomes in order to run VariantRecalibrator. For more info please click here.

    Regards
    Bhanu

Sign In or Register to comment.