Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

VariantRecalibrator - java.lang.NumberFormatException: For input string: "."

LaviniaLavinia Posts: 37Member

Hello, I am just trying VariantRecalibrator on my 4 samples:

java -jar GenomeAnalysisTK.jar -T VariantRecalibrator -R gatk.ucsc.mm10.fa -input UnifiedGenotyper.output.snps.raw.vcf -nt 14 -recalFile file_for_ApplyRecalibration.recal -tranchesFile file_for_ApplyRecalibration.tranches -resource:sanger,known=false,training=true,truth=true mgp.v2.snps.annot.reformat.vcf -resource:dbnsp,known=true,training=false,truth=false,prior=6.0 mm10_dbsnp.vcf -an QD -an HaplotypeScore -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an InbreedingCoeff -mG 4 -percentBad 0.05

which starts running then gives me this error: INFO 08:26:57,741 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NumberFormatException: For input string: "." at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:481) at java.lang.Integer.valueOf(Integer.java:582) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.decodeInts(AbstractVCFCodec.java:680) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:641) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:92) at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:130) at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:120) at org.broadinstitute.sting.utils.variantcontext.GenotypesContext.iterator(GenotypesContext.java:461) at org.broadinstitute.sting.utils.variantcontext.VariantContext.getCalledChrCount(VariantContext.java:922) at org.broadinstitute.sting.utils.variantcontext.VariantContext.getCalledChrCount(VariantContext.java:908) at org.broadinstitute.sting.utils.variantcontext.VariantContext.isMonomorphicInSamples(VariantContext.java:937) at org.broadinstitute.sting.utils.variantcontext.VariantContext.isPolymorphicInSamples(VariantContext.java:948) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.isValidVariant(VariantDataManager.java:278) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.parseTrainingSets(VariantDataManager.java:263) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.map(VariantRecalibrator.java:259) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.map(VariantRecalibrator.java:107) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:243) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:231) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:248) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:120) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:67) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:23) at org.broadinstitute.sting.gatk.executive.ShardTraverser.call(ShardTraverser.java:73) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: For input string: "."
ERROR ------------------------------------------------------------------------------------------

I've used all three VCFs in other GATK tools without issues. Any help greatly appreciated!, many thanks, Lavinia.

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,988Administrator, GATK Developer admin

    Can you validate your vcf, just to make sure there's nothing wrong with the file itself?

    You should also try running with the latest version.

    Geraldine Van der Auwera, PhD

  • LaviniaLavinia Posts: 37Member

    Hi Geraldine, thanks so much for the prompt response. We are not upgrading our version to make sure that we don't run into any issues with analysis for academic/commercial analysis (99.9% of our work is academic but there is a chance some of it could cross over). Using vcftools validation, two VCFs validate immediately with no issues, the third complains with a warning: The header tag 'contig' not present for CHROM=chr1. (Not required but highly recommended.) (for every chromosome). I'll add these headers in and repeat, and comment back on any progress, thank you.

  • LaviniaLavinia Posts: 37Member

    Hi Geraldine, I have corrected the one VCF with the header issues but unfortunately am still getting exactly the same error:

    INFO 14:31:23,744 ProgressMeter - chr1:18999977 8.42e+05 36.0 s 42.7 s 0.7% 86.2 m 85.6 m WARN 14:31:23,901 RestStorageService - Adjusted time offset in response to RequestTimeTooSkewed error. Local machine and S3 server disagree on the time by approximately 3888 seconds. Retrying connection. INFO 14:31:24,600 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NumberFormatException: For input string: "." at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:481) at java.lang.Integer.valueOf(Integer.java:582) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.decodeInts(AbstractVCFCodec.java:680) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:641) at org.broadinstitute.sting.utils.codecs.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:92) at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:130) at org.broadinstitute.sting.utils.variantcontext.LazyGenotypesContext.getGenotypes(LazyGenotypesContext.java:120) at org.broadinstitute.sting.utils.variantcontext.GenotypesContext.iterator(GenotypesContext.java:461) at org.broadinstitute.sting.utils.variantcontext.VariantContext.getCalledChrCount(VariantContext.java:922) at org.broadinstitute.sting.utils.variantcontext.VariantContext.getCalledChrCount(VariantContext.java:908) at org.broadinstitute.sting.utils.variantcontext.VariantContext.isMonomorphicInSamples(VariantContext.java:937) at org.broadinstitute.sting.utils.variantcontext.VariantContext.isPolymorphicInSamples(VariantContext.java:948) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.isValidVariant(VariantDataManager.java:278) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantDataManager.parseTrainingSets(VariantDataManager.java:263) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.map(VariantRecalibrator.java:259) at org.broadinstitute.sting.gatk.walkers.variantrecalibration.VariantRecalibrator.map(VariantRecalibrator.java:107) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:243) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:231) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:248) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:120) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:67) at org.broadinstitute.sting.gatk.traversals.TraverseLociNano.traverse(TraverseLociNano.java:23) at org.broadinstitute.sting.gatk.executive.ShardTraverser.call(ShardTraverser.java:73) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: For input string: "."
    ERROR ------------------------------------------------------------------------------------------

    I'll try the VariantRecalibrator tool on some other data and have a closer look at the VCFs, maybe there is some quirk in them that is causing issues, thanks.

  • LaviniaLavinia Posts: 37Member

    Hi, sorry just discovered ValidateVariants (!). This gives me:

    File mgp.v2.snps.annot.reformat.vcf fails strict validation: the REF allele is incorrect for the record at position chr1:3000054, fasta says G vs. VCF says C

    I'll check this and update, thanks.

  • LaviniaLavinia Posts: 37Member

    Hi Geraldine, thanks for that. I think it must be something within one of the VCFs, and exactly as you said, some information is missing. I'll try and chase this up and post my findings, many thanks.

  • LaviniaLavinia Posts: 37Member

    Ok, so I think your suggestions have helped me find the problem, the VCF causing issues is from ftp://ftp-mouse.sanger.ac.uk/REL-1211-SNPs_Indels/README, so it is a valid VCF but there must be differences between the reference sequence used by sanger and the reference sequence that I have used. I'll download their reference, re-run my pipeline to use that and then retry VariantRecalibrator. If this isn't the issue then I'll re-post, if not, consider the problem solved!, thanks.

Sign In or Register to comment.