Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

VAriantAnnotator error: java.lang.NumberFormatException: For input string: "."

syoungsyoung Olympia, WAMember

Hi I am going through the GATK Best Practices pipeline for the first time. I have non-human data and am at the variant discovery step in the pipeline. I don't have well-vetted 'known' variant sites but have pieced together a vcf with high-likelihood snps that are in common among several data sets in our lab over the past year. The vcf has only four basic annotations and it looks like the software that produced it doesn't provide for generating others so I am trying to add a few with VariantAnnotator. My command is
java -jar GenomeAnalysisTK.jar \
-R Oncorhynchus_mykiss_chr.fa \
-T VariantAnnotator \
-I Om2013all-rg.bam \
-A StrandOddsRatio \
-A MappingQualityRankSumTest \
-A ReadPosRankSumTest \
-o Om2013bElp.vcf \
-V Om2013bEl.vcf
When I run it I get an error:

ERROR stack trace

java.lang.NumberFormatException: For input string: "."
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
at java.lang.Double.parseDouble(Double.java:540)
at htsjdk.variant.variantcontext.GenotypeLikelihoods.parseDeprecatedGLString(GenotypeLikelihoods.java:251)
at htsjdk.variant.variantcontext.GenotypeLikelihoods.fromGLField(GenotypeLikelihoods.java:81)
at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:715)
at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:128)
at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:347)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:60)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:79)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:41)
at htsjdk.tribble.AbstractFeatureCodec.decodeLoc(AbstractFeatureCodec.java:40)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.readNextFeature(IndexFactory.java:502)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.(IndexFactory.java:403)
at htsjdk.tribble.index.IndexFactory.createDynamicIndex(IndexFactory.java:312)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:401)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:287)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:224)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:147)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1047)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:828)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:286)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: For input string: "."
ERROR ------------------------------------------------------------------------------------------

I've read everything I can find on VariantAnnotator on the GATK website but haven't found anything helpful yet. I am running
The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12.

I suspect that I am doing something wrong but can't track it down. Any ideas about what is or isn't going on?

Thanks,
Sewall

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    My guess is that your VCF is malformed. Some annotation programs output VCFs that don't conform to the spec. Can you post the header lines showing the dfeinitions of the annotations, and a few VCF records showing annotation values?

  • syoungsyoung Olympia, WAMember

    Geraldine

    Thank for the prompt response. Here is the header and a few lines of data.

    fileformat=VCFv4.0

    fileDate=20151003

    source="Stacks v1.30"

    INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">

    INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency">

    FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

    FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">

    FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele Depth">

    FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype Likelihood">

    CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 11DN0004-OmyWGS_2014 11DN0005-OmyWGS_2014 11DN0006-OmyWGS_2014 11DN0008-OmyWGS_2014 11DN0011-OmyWGS_2014 11DN0013-OmyWGS_2014 11DN0017-OmyWGS_2014 MYkALH12_F003-OmyWGS_2014 MYkALH12_M001-OmyWGS_2014 MYkALN12_F001-OmyWGS_2014 MYkALN12_F002-OmyWGS_2014 MYkALN12_M001-OmyWGS_2014 MYkALN12_M004-OmyWGS_2014 MYkALN12_M005-OmyWGS_2014 MyReit12x_0012-OmyWGS_2014 13BL0003-OmyWGS_2014 13BL0007-OmyWGS_2014 MyToku12x_0001-OmyWGS_2014 MyToku12x_0013-OmyWGS_2014

    chrUn_1 19860 21518 C T . PASS NS=19;AF=0.974,0.026 GT:DP:AD:GL 0/0:55:55,0:.,76.25,. 0/0:38:38,0:.,52.68,. 0/0:44:44,0:.,61,. 0/0:33:33,0:.,45.75,. 0/0:27:27,0:.,37.43,. 0/0:37:37,0:.,51.29,. 1/0:62:36,26:.,85.95,. 0/0:42:42,0:.,58.22,. 0/0:34:34,0:.,47.13,. 0/0:31:31,0:.,42.98,. 0/0:34:34,0:.,47.13,. 0/0:15:15,0:.,20.79,. 0/0:25:25,0:.,34.66,. 0/0:40:40,0:.,55.45,. 0/0:22:22,0:.,30.5,. 0/0:41:41,0:.,56.84,. 0/0:51:51,0:.,70.7,. 0/0:25:25,0:.,34.66,. 0/0:12:12,0:.,16.64,.
    chrUn_1 44097 23124 C A . PASS NS=19;AF=0.500,0.500 GT:DP:AD:GL 0/1:32:12,20:.,44.36,. 0/1:34:12,22:.,47.13,. 0/1:32:12,20:.,44.36,. 0/1:19:12,7:.,26.34,. 0/1:24:9,15:.,33.27,. 0/1:37:14,23:.,51.29,. 0/1:18:8,10:.,24.95,. 0/1:30:9,21:.,41.59,. 0/1:13:6,7:.,18.02,. 0/1:29:8,21:.,41.59,. 0/1:45:13,32:.,63.77,. 0/1:20:11,9:.,27.73,. 0/1:44:11,33:.,61,. 0/1:16:6,10:.,22.18,. 0/1:41:18,23:.,56.84,. 0/1:41:19,22:.,56.84,. 0/1:45:18,27:.,62.38,. 0/1:51:22,29:.,70.7,. 0/1:33:13,20:.,45.75,.
    chrUn_1 44181 23123 G A . PASS NS=19;AF=0.921,0.079 GT:DP:AD:GL 0/0:44:44,0:.,61,. 0/0:38:38,0:.,52.68,. 0/0:38:25,0:.,54.07,. 0/0:30:30,0:.,41.59,. 0/0:29:12,0:.,40.2,. 1/0:34:13,21:.,48.52,. 0/0:37:37,0:.,51.29,. 0/0:42:42,0:.,58.22,. 0/0:34:34,0:.,47.13,. 0/0:39:24,0:.,54.07,. 0/0:34:15,0:.,47.13,. 0/0:26:7,0:.,36.04,. 1/0:30:15,15:.,41.59,. 1/0:41:23,18:.,56.84,. 0/0:40:19,0:.,55.45,. 0/0:43:22,0:.,59.61,. 0/0:45:45,0:.,62.38,. 0/0:31:31,0:.,42.98,. 0/0:24:24,0:.,33.27,.
    chrUn_1 44204 23123 G A . PASS NS=19;AF=0.526,0.474 GT:DP:AD:GL 1/1:44:0,44:.,61,. 1/1:38:0,38:.,52.68,. 0/0:38:25,0:.,54.07,. 1/1:30:0,30:.,41.59,. 1/0:29:12,17:.,40.2,. 1/0:34:13,21:.,48.52,. 1/1:37:0,37:.,51.29,. 0/0:42:42,0:.,58.22,. 0/0:34:34,0:.,47.13,. 1/0:39:24,15:.,54.07,. 0/0:34:15,0:.,47.13,. 1/0:26:7,19:.,36.04,. 1/0:30:15,15:.,41.59,. 1/1:41:0,23:.,56.84,. 0/0:40:19,0:.,55.45,. 1/0:43:22,21:.,59.61,. 1/1:45:0,45:.,62.38,. 0/0:31:31,0:.,42.98,. 0/0:24:24,0:.,33.27,.
    chrUn_1 44217 23123 C T . PASS NS=19;AF=0.789,0.211 GT:DP:AD:GL 0/0:44:44,0:.,61,. 0/0:38:38,0:.,52.68,. 0/1:38:13,25:.,54.07,. 0/0:30:30,0:.,41.59,. 0/1:29:17,12:.,40.2,. 0/1:34:21,13:.,48.52,. 0/0:37:37,0:.,51.29,. 0/0:42:42,0:.,58.22,. 0/0:34:34,0:.,47.13,. 0/0:39:24,0:.,54.07,. 0/1:34:19,15:.,47.13,. 0/1:26:19,7:.,36.04,. 0/0:30:15,0:.,41.59,. 0/0:41:23,0:.,56.84,. 0/1:40:21,19:.,55.45,. 0/0:43:22,0:.,59.61,. 0/0:45:45,0:.,62.38,. 1/1:31:0,31:.,42.98,. 0/0:24:24,0:.,33.27,.

  • syoungsyoung Olympia, WAMember

    It looks like the leading "##" got dropped when I pasted in the snippet. The header lines begin with "##" except the "# CHROM .." line.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @syoung
    Hi Sewall,

    Can you tell me how you generated the input VCF?

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I'm guessing this was produced by Stacks since the source line says source="Stacks v1.30".

    The error is happening when we're trying to parse the GL field which is defined by the program in the header as:

    FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype Likelihood">
    

    The program is trying to parse a number but finding a . instead. I'm not sure if this is a case of the field type being specified incorrectly in the header, or . not being allowed for a missing GL value. You can try filling in number values in one or the other and see whether one fixes the error.

  • syoungsyoung Olympia, WAMember

    Thanks to all of you for responding. Your explanations make sense. For my immediate needs I'm better off using hard-filtering.

Sign In or Register to comment.