Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SelectVariants tool crashed because of error for input string.

Hi,

I am trying to use GATK's SelectVariants tool to select variants of a specific sample from a vcf file that has variants for 476 different samples (it's a very large file, 14Gb). I have done this before for smaller vcf files and never had any problems. I wonder if it's the vcf file size that is causing the problem.
I'm using java 1.8.0_151.

Here is the command I ran:
java -d64 -Xmx4g -jar /home/tjs23/apps/GenomeAnalysisTK-3.7/GenomeAnalysisTK.jar \
-T SelectVariants \
-R /data2/genome_builds/c_elegans_WS264/c_elegans.PRJNA13758.WS264.genomic.fa \
-V /scratch/gnelson/strain_bbmap/merged_476_comb_freebayes.vcf \
-o /scratch/gnelson/strain_bbmap/AX3841_homozy_freebayes.vcf \
-sn sample_AX3841 \
-select "vc.getGenotype('sample_AX3841').isHomVar()"

Here is the error message:
INFO 12:19:37,470 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:19:37,472 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
INFO 12:19:37,472 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 12:19:37,473 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 12:19:37,473 HelpFormatter - [Thu May 24 12:19:37 BST 2018] Executing on Linux 4.4.0-53-generic amd64
INFO 12:19:37,473 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12
INFO 12:19:37,475 HelpFormatter - Program Args: -T SelectVariants -R /data2/genome_builds/c_elegans_WS264/c_elegans.PRJNA13758.WS264.genomic.fa -V /scratch/gnelson/strain_bbmap/merged_476_comb_freebayes.vcf -o /scratch/gnelson/strain_bbmap/AX3841_homozy_freebayes.vcf -sn sample_AX3841 -select vc.getGenotype('sample_AX3841').isHomVar()
INFO 12:19:37,478 HelpFormatter - Executing as [email protected] on Linux 4.4.0-53-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12.
INFO 12:19:37,479 HelpFormatter - Date/Time: 2018/05/24 12:19:37
INFO 12:19:37,479 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:19:37,479 HelpFormatter - --------------------------------------------------------------------------------
INFO 12:19:37,494 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:19:37,564 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000

ERROR --
ERROR stack trace

java.lang.NumberFormatException: For input string: "-IV"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at htsjdk.variant.variantcontext.GenotypeLikelihoods.parseDeprecatedGLString(GenotypeLikelihoods.java:260)
at htsjdk.variant.variantcontext.GenotypeLikelihoods.fromGLField(GenotypeLikelihoods.java:90)
at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:715)
at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:129)
at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:348)
at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:280)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:258)
at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:61)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:74)
at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:36)
at htsjdk.tribble.AbstractFeatureCodec.decodeLoc(AbstractFeatureCodec.java:43)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.readNextFeature(IndexFactory.java:508)
at htsjdk.tribble.index.IndexFactory$FeatureIterator.next(IndexFactory.java:470)
at htsjdk.tribble.index.IndexFactory.createIndex(IndexFactory.java:344)
at htsjdk.tribble.index.IndexFactory.createDynamicIndex(IndexFactory.java:307)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:441)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:327)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:264)
at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:153)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208)
at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1052)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:829)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:287)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: For input string: "-IV"
ERROR ------------------------------------------------------------------------------------------

I have tried to find where the string "-IV" is in the file by using grep. Unfortunately, I could find it. I have also selected random samples of the original vcf file and the command worked fine for those.

Please let me know if you need me to provide any further details.

Many thanks,
Paula

Tagged:

Answers

  • I was meant to say that unfortunately, I could not find "-IV" in my large vcf file. I apologise for the typo.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited May 2018

    @PFPritchett
    Hi Paula,

    I wonder if the tool is having issues because the VCF was not produced by GATK? Have you run SelectVariants successfully before on Freebayes VCFs? Can you try running ValidateVariants?

    Thanks,
    Sheila

  • Hi Sheila,

    Thanks for responding.
    I agree that having generated this file using Freebayes might have caused some compatibility issues. Unfortunately, i can't point out which. I have run SelectVariants on files generated by Freebayes without running into troubles in the past. That's why I thought I might be able to do it again.
    I have followed your suggestion and run ValidateVariants. I got the same error message:
    java -d64 -Xmx4g -jar $GATK -T ValidateVariants -V ~/debono-pc-27.lmb.internal/scratch/gnelson/strain_bbmap/merged_476_comb_freebayes.vcf -R ~/debono-pc-27.lmb.internal/data2/genome_builds/c_elegans_WS264/c_elegans.PRJNA13758.WS264.genomic.fa
    INFO 15:07:28,674 HelpFormatter - --------------------------------------------------------------------------------
    INFO 15:07:28,676 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.6-0-g89b7209, Compiled 2016/06/01 22:27:29
    INFO 15:07:28,676 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO 15:07:28,676 HelpFormatter - For support and documentation go to https://www.broadinstitute.org/gatk
    INFO 15:07:28,676 HelpFormatter - [Mon Jun 04 15:07:28 BST 2018] Executing on Linux 4.9.10-040910-generic amd64
    INFO 15:07:28,676 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_92-b15 JdkDeflater
    INFO 15:07:28,679 HelpFormatter - Program Args: -T ValidateVariants -V /home/paulafp/debono-pc-27.lmb.internal/scratch/gnelson/strain_bbmap/merged_476_comb_freebayes.vcf -R /home/paulafp/debono-pc-27.lmb.internal/data2/genome_builds/c_elegans_WS264/c_elegans.PRJNA13758.WS264.genomic.fa
    INFO 15:07:28,683 HelpFormatter - Executing as [email protected] on Linux 4.9.10-040910-generic amd64; OpenJDK 64-Bit Server VM 1.8.0_92-b15.
    INFO 15:07:28,684 HelpFormatter - Date/Time: 2018/06/04 15:07:28
    INFO 15:07:28,684 HelpFormatter - --------------------------------------------------------------------------------
    INFO 15:07:28,684 HelpFormatter - --------------------------------------------------------------------------------
    INFO 15:07:28,697 GenomeAnalysisEngine - Strictness is SILENT
    INFO 15:07:28,840 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000

    ERROR --
    ERROR stack trace

    java.lang.NumberFormatException: For input string: "-IV"
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2043)
    at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.lang.Double.parseDouble(Double.java:538)
    at htsjdk.variant.variantcontext.GenotypeLikelihoods.parseDeprecatedGLString(GenotypeLikelihoods.java:260)
    at htsjdk.variant.variantcontext.GenotypeLikelihoods.fromGLField(GenotypeLikelihoods.java:90)
    at htsjdk.variant.vcf.AbstractVCFCodec.createGenotypeMap(AbstractVCFCodec.java:714)
    at htsjdk.variant.vcf.AbstractVCFCodec$LazyVCFGenotypesParser.parse(AbstractVCFCodec.java:128)
    at htsjdk.variant.variantcontext.LazyGenotypesContext.decode(LazyGenotypesContext.java:158)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:347)
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:60)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:74)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:36)
    at htsjdk.tribble.AbstractFeatureCodec.decodeLoc(AbstractFeatureCodec.java:43)
    at htsjdk.tribble.index.IndexFactory$FeatureIterator.readNextFeature(IndexFactory.java:493)
    at htsjdk.tribble.index.IndexFactory$FeatureIterator.next(IndexFactory.java:455)
    at htsjdk.tribble.index.IndexFactory.createIndex(IndexFactory.java:329)
    at htsjdk.tribble.index.IndexFactory.createDynamicIndex(IndexFactory.java:303)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:441)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:327)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:264)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:153)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1047)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:824)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:282)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions https://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: For input string: "-IV"
    ERROR ------------------------------------------------------------------------------------------

    What really surprises me is that I don't find "-IV" anywhere in the input VCF file.

    At the moment I'm just parsing my vcf using python. It's not ideal, but I couldn't get SelectVariants to work.

    Thanks again for your help!
    Paula

Sign In or Register to comment.