Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

VariantRecalibrator: ERROR stack trace

cgecge Posts: 3Member
edited December 2012 in Ask the GATK team

Hi, I'm encountering this error running VariantRecalibrator with data from 3 samples (I'm testing): Maybe is the problem due to small sample size?

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
java.lang.NullPointerException
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.selectWorstVariants(VariantDataManager.java:179)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:306)
        at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.onTraversalDone(VariantRecalibrator.java:107)
        at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:97)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:94)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.2-16-g9f648cb):
##### ERROR
##### ERROR Please visit the wiki to see if this is a known problem
##### ERROR If not, please post the error, with stack trace, to the GATK forum
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Code exception (see stack trace for error itself)
##### ERROR ------------------------------------------------------------------------------------------

Thanks

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,413Administrator, GATK Developer admin

    That is possible, although the program should tell you explicitly that the dataset is too small, rather than fail like this. Can you tell me what is your command line and what is your dataset like?

    Geraldine Van der Auwera, PhD

  • ispirasispiras Posts: 3

    Thanks for your answer. My command is:

    java -Xmx24g -jar ${GATK} \ -T VariantRecalibrator \ -R $REF \ -input test_3_samples.mark_dup.indel.rc.bam.raw.vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 $ref_dir/hapmap_3.3.b37.sites.vcf \ -resource:omni,known=false,training=true,truth=false,prior=12.0 $ref_dir/1000G_omni2.5.b37.sites.vcf \ -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 $ref_dir/dbSNP137.vcf \ -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ \ --maxGaussians 4 \ --percentBadVariants 0.05 \ --minNumBadVariants 1000 \ -mode SNP \ -recalFile $final_dir/output.recal \ -tranchesFile $final_dir/output.tranches \ -rscriptFile $final_dir/output.plots.R \

    Moreover before the error message that I posted yesterday, there are this warning:

    [......] INFO 11:59:16,223 VariantRecalibratorEngine - Evaluating full set of 1072 variants... INFO 11:59:16,223 VariantDataManager - Found 0 variants overlapping bad sites training tracks. WARN 11:59:16,224 VariantDataManager - WARNING: Training with very few variant sites! Please check the model reporting PDF to ensure the quality of the model is reliable. INFO 11:59:17,158 GATKRunReport - Uploaded run statistics report to AWS S3 [.......]

    Thanks

  • ispirasispiras Posts: 3

    They are data from targeted resequencing (1 gene, approximately 300 Kb)

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,413Administrator, GATK Developer admin

    Ah, yes that makes sense. That is not enough data for variant recalibration. You should use hard filtering; please see the Best Practices documentation for our recommendations for small datasets.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.