Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

ERROR MESSAGE: Unable to parse header with error: For input string: "R",

this command....

java -Xmx1024m -cp /cm/shared/apps/gatk/3.2-2/GenomeAnalysisTK.jar org.broadinstitute.gatk.tools.CatVariants -R /data/ngseq/ref/b37_automasked_arms.fa -V /scratch/ngseq/dz45/NGS/4-variant_calling/region_vcfs/1_human_varSites_allBaits_allSamples_BQ20_MQ30_gene_conv_3020_gene_conv_3020_human.vcf -V /scratch/ngseq/dz45/NGS/4-variant_calling/region_vcfs/2_human_varSites_allBaits_allSamples_BQ20_MQ30_gene_conv_3020_gene_conv_3020_human.vcf -out /scratch/ngseq/dz45/NGS/4-variant_calling/region_vcfs/human_varSites_allBaits_allSamples_BQ20_MQ30_test.vcf

...returns this error...

INFO 11:29:59,871 HelpFormatter - -------------------------------------------------------
INFO 11:29:59,873 HelpFormatter - Program Name: org.broadinstitute.gatk.tools.CatVariants
INFO 11:29:59,876 HelpFormatter - Program Args: -R /data/ngseq/ref/b37_automasked_arms.fa -V /scratch/ngseq/dz45/NGS/4-variant_calling/region_vcfs/1_human_varSites_allBaits_allSamples_BQ20_MQ30_gene_conv_3020_gene_conv_3020_human.vcf -V /scratch/ngseq/dz45/NGS/4-variant_calling/region_vcfs/2_human_varSites_allBaits_allSamples_BQ20_MQ30_gene_conv_3020_gene_conv_3020_human.vcf -out /scratch/ngseq/dz45/NGS/4-variant_calling/region_vcfs/human_varSites_allBaits_allSamples_BQ20_MQ30_test.vcf
INFO 11:29:59,880 HelpFormatter - Executing as [email protected] on Linux 2.6.32-358.11.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17.
INFO 11:29:59,880 HelpFormatter - Date/Time: 2015/07/10 11:29:59
INFO 11:29:59,880 HelpFormatter - -------------------------------------------------------
INFO 11:29:59,880 HelpFormatter - -------------------------------------------------------

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: For input string: "R", for input source: /scratch/ngseq/dz45/NGS/4-variant_calling/region_vcfs/1_human_varSites_allBaits_allSamples_BQ20_MQ30_gene_conv_3020_gene_conv_3020_human.vcf
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:203)
at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:91)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:89)
at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:66)
at org.broadinstitute.gatk.tools.CatVariants.getFeatureReader(CatVariants.java:191)
at org.broadinstitute.gatk.tools.CatVariants.execute(CatVariants.java:258)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.tools.CatVariants.main(CatVariants.java:317)
Caused by: java.lang.NumberFormatException: For input string: "R"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.valueOf(Integer.java:766)
at htsjdk.variant.vcf.VCFCompoundHeaderLine.(VCFCompoundHeaderLine.java:171)
at htsjdk.variant.vcf.VCFFormatHeaderLine.(VCFFormatHeaderLine.java:49)
at htsjdk.variant.vcf.AbstractVCFCodec.parseHeaderFromLines(AbstractVCFCodec.java:211)
at htsjdk.variant.vcf.VCFCodec.readActualHeader(VCFCodec.java:111)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:88)
at htsjdk.tribble.AsciiFeatureCodec.readHeader(AsciiFeatureCodec.java:41)
at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:201)
... 8 more

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.2-2-gec30cee):
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR MESSAGE: Unable to parse header with error: For input string: "R", for input source: /scratch/ngseq/dz45/NGS/4-variant_calling/region_vcfs/1_human_varSites_allBaits_allSamples_BQ20_MQ30_gene_conv_3020_gene_conv_3020_human.vcf
ERROR ------------------------------------------------------------------------------------------

so somewhere in my VCF its expecting a number and gets an "R", right?
any ideas how to locate this "R"?
can i give you more information?



  • flytrapflytrap ukMember

    the vcf (made by mpileup) contains the following lines

    FORMAT=<ID=DPR,Number=R,Type=Integer,Description="Number of high-quality bases observed for each allele">

    INFO=<ID=DPR,Number=R,Type=Integer,Description="Number of high-quality bases observed for each allele">

    I'm guessing these are the "R"s that its having problems with, right?
    If so then this may be the wrong forum on which to ask what they mean, but if anyone wants to help me out anyway, i'd be grateful!

  • flytrapflytrap ukMember
    edited July 2015

    "The Number entry is an Integer
    that describes the number of values that can be included with the INFO field. For example, if the INFO field contains
    a single number, then this value should be 1; if the INFO field describes a pair of numbers, then this value should
    be 2 and so on. There are also certain special characters used to define special cases:"
    "If the field has one value for each possible allele (including the reference), then this value should be `R'"

    ...so that is a valid value?
    any idea how to make GATK accept my files, without significantly changing the meaning of their content?

    if i change the "R"s to "4"s (for the 4 possible-possible alleles - i haven't called indels), it seems to work.
    do you think i've opened up any problems for myself in the future?

  • SheilaSheila Broad InstituteMember, Broadie admin


    I think the best thing to do here is upgrade to the latest release of GATK. A new version just came out today, so it is best to use that. The NUMBER=R should be supported by the latest release.


Sign In or Register to comment.