INDEL + VariantRecalibrator "Training with very few variant sites"

VariantRecalibrator seems to fail for my haloplex dataset because, as far as I understand, there is not enough indels in my dataset.

WARN  16:06:52,414 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable. 
INFO  16:06:52,420 GaussianMixtureModel - Initializing model with 100 k-means iterations... 
INFO  16:06:52,482 VariantRecalibratorEngine - Finished iteration 0. 
INFO  16:06:52,506 VariantRecalibratorEngine - Finished iteration 5.    Current change in mixture coefficients = 0.01625 
INFO  16:06:52,514 VariantRecalibratorEngine - Finished iteration 10.   Current change in mixture coefficients = 0.00691 
INFO  16:06:52,522 VariantRecalibratorEngine - Finished iteration 15.   Current change in mixture coefficients = 0.02285 
INFO  16:06:52,529 VariantRecalibratorEngine - Finished iteration 20.   Current change in mixture coefficients = 0.00935 
INFO  16:06:52,534 VariantRecalibratorEngine - Convergence after 24 iterations! 
INFO  16:06:52,539 VariantDataManager - Training with worst 0 scoring variants --> variants with LOD <= -5.0000. 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.gatk.utils.exceptions.ReviewedGATKException: Unable to retrieve result
    at org.broadinstitute.gatk.engine.executive.HierarchicalMicroScheduler.execute(
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(

Of courses, this breaks my workflow :-)

Would it be possible to generate a 'mock' recalFile that would tell ApplyRecalibration:

"there is no data to recalibrate but write a VCF anyway".

What would the recalFile look like ?



Best Answer


Sign In or Register to comment.