ERROR MESSAGE: Your input file has a malformed header

amandineldamandineld Member
edited April 15 in Ask the GATK team
Hello,
I want to annotate my file through gatk 3.8 but I get this error:

MESSAGE: Your input file has a malformed header: there are not enough columns present in the header line: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AERMQNK-paris-8400326-P6-recipient BEGHUSZ-paris-CIN130255-P5-donor BNAGPIP-paris-SCE240246-P1-donor DKFEYNF-paris-9402755-P24-recipient DUWPYOX-paris-LAP201156-P2-donor ENERRIA-paris-7406853-P13-donor FSCDYAD-paris-08404250-P18-recipient GZISOJR-paris-7400781-P16-donor HDXJQEQ-paris-BOU231086-P8-recipient HPPXKJR-paris-DAS070568-P11-recipient IBMRISR-paris-1012026425-P14-donor JBQRXUU-paris-BAL040960-P4-donor JGYNLLJ-paris-9406986-P22-donor JOJLUPS-paris-CIN020578-P5-recipient KXQMHUU-paris-PAQ030369-P10-donor MFQDFBY-paris-6405545-P19-donor MMGEWHT-paris-7400697-P16-recipient MVDUSTB-paris-BAR131257-P10-recipient MVTISRC-paris-761011046144-P14-recipient NEYXWDZ-paris-RIZ120484-P9-recipient NFAMZIJ-paris-09406539-P22-recipient OENIGXI-paris-6405485-P17-donor ONDAMJK-paris-SCE260544-P1-recipient PAIFUYN-paris-JEA110683-P7-donor QGGDOUC-paris-GIL090756-P3-donor QNIODCU-paris-06405141-P19-recipient RGFUJSJ-paris-7405431-P15-donor RPUJEMI-paris-JEA050373-P7-recipient RZBCDKZ-paris-06400985-P15-recipient SNDXLDU-paris-06404984-P17-recipient TRPLSFT-paris-7401924-P13-recipient UFVLBVS-paris-GIL160150-P3-recipient VOCSMLM-paris-DEL240879-P2-recipient WKCQCMA-paris-9402754-P24-donor WKXMWKI-paris-8404252-P18-donor XUPTJWH-paris-RIZ310548-P9-donor YPOYSQD-paris-DAC250146-P11-donor YYMSZHP-paris-DEM300679-P6-donor ZCRWXBD-paris-BOU280575-P8-donor ZGAQHAH-paris-BAL251159-P4-recipient


after research, I don't see what can generate the program.
Header of my vcf:

##fileformat=VCFv4.1

##Goby=development (20150227145140)

##INFO=

##INFO=

##FORMAT=

##FORMAT=

##FORMAT=

##FORMAT=

##FORMAT=

##INFO=

##source_20150426.1=vcf-annotate(r797) -a output-21711.tsv.gz -d /scratchLocal/gobyweb/tmp/421024.1.rascals.q/attributes.lst -c CHROM,FROM,TO,INFO/VariantEffectPrediction

##INFO=

##INFO=

##source_20150426.2=vcf-annotate(r797) -a /scratchLocal/gobyweb/ARTIFACT_REPOSITORY-dev/artifacts/ENSEMBL_ANNOTATIONS/ANNOTATIONS/2.1.3/HOMO_SAPIENS/1000GENOMES/37/ref-start-end-gene-hgnc-sorted.tsv.gz -d /scratchLocal/gobyweb/tmp/421024.1.rascals.q/attributes.lst -c CHROM,FROM,TO,INFO/GENE,INFO/GENE_NAME

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AERMQNK-paris-8400326-P6-recipient BEGHUSZ-paris-CIN130255-P5-donor BNAGPIP-paris-SCE240246-P1-donor DKFEYNF-paris-9402755-P24-recipient DUWPYOX-paris-LAP201156-P2-donor ENERRIA-paris-7406853-P13-donor FSCDYAD-paris-08404250-P18-recipient GZISOJR-paris-7400781-P16-donor HDXJQEQ-paris-BOU231086-P8-recipient HPPXKJR-paris-DAS070568-P11-recipient IBMRISR-paris-1012026425-P14-donor JBQRXUU-paris-BAL040960-P4-donor JGYNLLJ-paris-9406986-P22-donor JOJLUPS-paris-CIN020578-P5-recipient KXQMHUU-paris-PAQ030369-P10-donor MFQDFBY-paris-6405545-P19-donor MMGEWHT-paris-7400697-P16-recipient MVDUSTB-paris-BAR131257-P10-recipient MVTISRC-paris-761011046144-P14-recipient NEYXWDZ-paris-RIZ120484-P9-recipient NFAMZIJ-paris-09406539-P22-recipient OENIGXI-paris-6405485-P17-donor ONDAMJK-paris-SCE260544-P1-recipient PAIFUYN-paris-JEA110683-P7-donor QGGDOUC-paris-GIL090756-P3-donor QNIODCU-paris-06405141-P19-recipient RGFUJSJ-paris-7405431-P15-donor RPUJEMI-paris-JEA050373-P7-recipient RZBCDKZ-paris-06400985-P15-recipient SNDXLDU-paris-06404984-P17-recipient TRPLSFT-paris-7401924-P13-recipient UFVLBVS-paris-GIL160150-P3-recipient VOCSMLM-paris-DEL240879-P2-recipient WKCQCMA-paris-9402754-P24-donor WKXMWKI-paris-8404252-P18-donor XUPTJWH-paris-RIZ310548-P9-donor YPOYSQD-paris-DAC250146-P11-donor YYMSZHP-paris-DEM300679-P6-donor ZCRWXBD-paris-BOU280575-P8-donor ZGAQHAH-paris-BAL251159-P4-recipient


1 721427 . C . . . BIOMART_COORDS=1:721427:721427;GENE=ENSG00000230021,ENSG00000237491 GT:BC:GB:FB:Zygosity 0/0:C=187:187:0:homozygous 0/0:C=202:202:2:homozygous 0/0:C=157:157:0:homozygous 0/0:C=153:153:0:homozygous 0/0:C=196:196:0:homozygous 0/0:C=111:111:0:homozygous 0/0:C=176:176:0:homozygous 0/0:C=73:73:0:homozygous 0/0:C=195:195:0:homozygous 0/0:C=199:199:0:homozygous 0/0:C=78:78:0:homozygous 0/0:C=243:243:0:homozygous 0/0:C=65:65:0:homozygous 0/0:C=189:189:0:homozygous 0/0:C=166:166:0:homozygous 0/0:C=245:245:2:homozygous 0/0:C=60:60:0:homozygous 0/0:C=224:224:0:homozygous 0/0:C=84:84:0:homozygous 0/0:C=138:138:0:homozygous 0/0:C=156:156:0:homozygous 0/0:C=84:84:0:homozygous 0/0:C=211:211:0:homozygous 0/0:C=234:234:0:homozygous 0/0:C=247:247:0:homozygous 0/0:C=187:187:0:homozygous 0/0:C=124:124:0:homozygous 0/0:C=201:201:0:homozygous 0/0:C=155:155:0:homozygous 0/0:C=168:168:0:homozygous 0/0:C=99:99:0:homozygous 0/0:C=179:179:0:homozygous 0/0:C=209:209:2:homozygous 0/0:C=91:91:0:homozygous 0/0:C=189:189:0:homozygous 0/0:C=206:206:0:homozygous 0/0:C=167:167:2:homozygous 0/0:C=159:159:0:homozygous 0/0:C=220:220:0:homozygous 0/0:C=222:222:0:homozygous


Thank you in advance.

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @amandineld

    Please try it with the latest version GATK v4.1.1.0 and see if the problem persists.

  • amandineldamandineld Member
    @bhanuGandham yes, I also tried with version 4.1.1 but I find myself with the same error
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @amandineld

    Would you please post the exact command you are using with gatk4.1.1.0 and the entire error stack trace it produces.

  • amandineldamandineld Member
    Hi , I use this following command

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx5g -jar gatk-package-4.1.1.0-local.jar VariantAnnotator -R /sandbox/resources/species/human/ensembl/release-75/Homo_sapiens.GRCh37.75.dna.toplevel.fa -V GQPDOMB-stats.vcf -O rsID_GQPDOMB.vcf --dbsnp
    /sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz

    and here it's the exit of the terminal :

    21:51:08.917 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sandbox/users/alecerf-defer/Alloscore_work/GQPDOMB/GATK/gatk-package-4.1.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Apr 11, 2019 9:51:14 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    21:51:14.679 INFO VariantAnnotator - ------------------------------------------------------------
    21:51:14.680 INFO VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.1.1.0
    21:51:14.680 INFO VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
    21:51:14.680 INFO VariantAnnotator - Executing as [email protected] on Linux v3.10.0-957.1.3.el7.x86_64 amd64
    21:51:14.681 INFO VariantAnnotator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
    21:51:14.681 INFO VariantAnnotator - Start Date/Time: April 11, 2019 9:51:07 PM CEST
    21:51:14.681 INFO VariantAnnotator - ------------------------------------------------------------
    21:51:14.681 INFO VariantAnnotator - ------------------------------------------------------------
    21:51:14.682 INFO VariantAnnotator - HTSJDK Version: 2.19.0
    21:51:14.682 INFO VariantAnnotator - Picard Version: 2.19.0
    21:51:14.683 INFO VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    21:51:14.683 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    21:51:14.683 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    21:51:14.683 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    21:51:14.683 INFO VariantAnnotator - Deflater: IntelDeflater
    21:51:14.683 INFO VariantAnnotator - Inflater: IntelInflater
    21:51:14.683 INFO VariantAnnotator - GCS max retries/reopens: 20
    21:51:14.683 INFO VariantAnnotator - Requester pays: disabled
    21:51:14.684 WARN VariantAnnotator -

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Warning: VariantAnnotator is a BETA tool and is not yet ready for use in production

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


    21:51:14.684 INFO VariantAnnotator - Initializing engine
    21:51:17.427 INFO FeatureManager - Using codec VCFCodec to read file file:///sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz
    21:51:18.386 INFO FeatureManager - Using codec VCFCodec to read file file:///sandbox/users/alecerf-defer/Alloscore_work/GQPDOMB/GATK/GQPDOMB-stats.vcf
    21:51:18.429 WARN IndexUtils - Feature file "/sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
    21:51:18.845 INFO VariantAnnotator - Done initializing engine
    21:51:18.919 INFO ProgressMeter - Starting traversal
    21:51:18.919 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    21:51:20.434 INFO VariantAnnotator - Shutting down engine
    [April 11, 2019 9:51:20 PM CEST] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.22 minutes.
    Runtime.totalMemory()=1185939456
    htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 67: unparsable vcf record with allele A-GG, for input source: GQPDOMB-stats.vcf
    at htsjdk.variant.vcf.AbstractVCFCodec.generateException(AbstractVCFCodec.java:801)
    at htsjdk.variant.vcf.AbstractVCFCodec.checkAllele(AbstractVCFCodec.java:593)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseSingleAltAllele(AbstractVCFCodec.java:621)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseAlleles(AbstractVCFCodec.java:560)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:358)
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:299)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:277)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:64)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:70)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:37)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.readNextRecord(TribbleIndexedFeatureReader.java:373)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:354)
    at htsjdk.tribble.TribbleIndexedFeatureReader$WFIterator.next(TribbleIndexedFeatureReader.java:315)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:99)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:984)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:138)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
  • srikant_vermasrikant_verma IndiaMember

    @amandineld From the error message it looks like the input VCF is not valid. You may check using some VCF validators like GATK's ValidateVariants or EBIvariation/vcf-validator.
    @bhanuGandham Is it advisable at all to use a tool (VariantAnnotator) in Beta version for publication?

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @amandineld I agree with @srikant_verma , looks like the input VCF is not valid. In such cases use ValidateVariants to verify adherence of a file to VCF format.

    @srikant_verma before we take a tool out of beta we put the tool through extensive tests and once we are confident it does what it is intended to, then we take it out of beta. So if you want to use any beta tools in your publications, you should first confirm with our dev team what the caveats are and then make an informed decision.

Sign In or Register to comment.