Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Input essai.vcf fails strict validation: The allele with index 1 is not defined in the REF/ALT colum

Hello, I turn to you after several days of research. I want to annotate my vcf file with the latest version of Gatk. To do this, I first want to check the correctness of my vcf file but I find myself with a recurring problem.

command line :
java -Dsamjdk.use_async_io-read-samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_trible=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Xmx5g -jar gatk-package-4.1.2.0-local.jar ValidateVariants -R /sandbox/resources/species/human/ensembl/release-75/Homo_sapiens.GRCh37.75.dna.toplevel.fa -V essai.vcf --dbsnp /sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz

error :

15:41:58.483 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sandbox/users/alecerf-defer/Alloscore_work/Script_Alloscore_imputation/essai_vcf_sophie/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
May 16, 2019 3:42:00 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:42:00.293 INFO ValidateVariants - ------------------------------------------------------------
15:42:00.294 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
15:42:00.294 INFO ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
15:42:00.294 INFO ValidateVariants - Executing as [email protected] on Linux v3.10.0-957.1.3.el7.x86_64 amd64
15:42:00.294 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-b12
15:42:00.295 INFO ValidateVariants - Start Date/Time: May 16, 2019 3:41:58 PM CEST
15:42:00.295 INFO ValidateVariants - ------------------------------------------------------------
15:42:00.295 INFO ValidateVariants - ------------------------------------------------------------
15:42:00.296 INFO ValidateVariants - HTSJDK Version: 2.19.0
15:42:00.296 INFO ValidateVariants - Picard Version: 2.19.0
15:42:00.296 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:42:00.296 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:42:00.296 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:42:00.296 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:42:00.296 INFO ValidateVariants - Deflater: IntelDeflater
15:42:00.296 INFO ValidateVariants - Inflater: IntelInflater
15:42:00.296 INFO ValidateVariants - GCS max retries/reopens: 20
15:42:00.296 INFO ValidateVariants - Requester pays: disabled
15:42:00.296 INFO ValidateVariants - Initializing engine
15:42:01.470 INFO FeatureManager - Using codec VCFCodec to read file file:///sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz
15:42:01.639 INFO FeatureManager - Using codec VCFCodec to read file file:///sandbox/users/alecerf-defer/Alloscore_work/Script_Alloscore_imputation/essai_vcf_sophie/gatk-4.1.2.0/essai.vcf
15:42:01.646 WARN IndexUtils - Feature file "/sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
15:42:01.770 INFO ValidateVariants - Done initializing engine
15:42:01.770 INFO ProgressMeter - Starting traversal
15:42:01.770 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
15:42:02.546 INFO ValidateVariants - Shutting down engine
[May 16, 2019 3:42:02 PM CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.07 minutes.
Runtime.totalMemory()=2389704704


A USER ERROR has occurred: Input essai.vcf fails strict validation: The allele with index 1 is not defined in the REF/ALT columns in the record of type:


org.broadinstitute.hellbender.exceptions.UserException$FailsStrictValidation: Input essai.vcf fails strict validation: The allele with index 1 is not defined in the REF/ALT columns in the record of type:
at org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants.apply(ValidateVariants.java:259)
at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:106)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:104)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)

vcf :

attached

How can I check and annotate my vcf file?

thank you in advance

Answers

  • dbeckerdbecker MunichMember ✭✭✭

    Hi,

    some of the entries in your vcf don't have an ALT but only a ".". The Allele with index 1 is usually the first ALT. If the entry does not define an ALT, the line in the vcf is not usefull and maybe not correct.

    Try to remove all entries where ALT = "." and run again.

    Best,
    Daniel

  • amandineldamandineld Member

    thank you for your answer but even if I remove the variations without an alternative allele, I still find myself with an error:

    java -Dsamjdk.use_async_io-read-samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_trible=false -Dsamjdk.compression_level=2 -Xmx5g -jar gatk-package-4.1.2.0-local.jar VariantAnnotator -R /sandbox/resources/species/human/ensembl/release-75/Homo_sapiens.GRCh37.75.dna.toplevel.fa -V essai_id2.vcf --output rsIDessai_id2.vcf --dbsnp /sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz
    14:54:22.435 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/sandbox/users/alecerf-defer/Alloscore_work/Script_Alloscore_imputation/essai_vcf_sophie/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    May 17, 2019 2:54:34 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    14:54:34.132 INFO VariantAnnotator - ------------------------------------------------------------
    14:54:34.132 INFO VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.1.2.0
    14:54:34.132 INFO VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
    14:54:34.133 INFO VariantAnnotator - Executing as [email protected] on Linux v3.10.0-957.1.3.el7.x86_64 amd64
    14:54:34.133 INFO VariantAnnotator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_131-b11
    14:54:34.133 INFO VariantAnnotator - Start Date/Time: May 17, 2019 2:54:20 PM CEST
    14:54:34.133 INFO VariantAnnotator - ------------------------------------------------------------
    14:54:34.133 INFO VariantAnnotator - ------------------------------------------------------------
    14:54:34.135 INFO VariantAnnotator - HTSJDK Version: 2.19.0
    14:54:34.135 INFO VariantAnnotator - Picard Version: 2.19.0
    14:54:34.135 INFO VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    14:54:34.135 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    14:54:34.135 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    14:54:34.135 INFO VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    14:54:34.135 INFO VariantAnnotator - Deflater: IntelDeflater
    14:54:34.135 INFO VariantAnnotator - Inflater: IntelInflater
    14:54:34.135 INFO VariantAnnotator - GCS max retries/reopens: 20
    14:54:34.135 INFO VariantAnnotator - Requester pays: disabled
    14:54:34.136 WARN VariantAnnotator -

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Warning: VariantAnnotator is a BETA tool and is not yet ready for use in production

    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    14:54:34.136 INFO VariantAnnotator - Initializing engine
    14:54:57.028 INFO FeatureManager - Using codec VCFCodec to read file file:///sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz
    14:54:59.905 INFO FeatureManager - Using codec VCFCodec to read file file:///sandbox/users/alecerf-defer/Alloscore_work/Script_Alloscore_imputation/essai_vcf_sophie/gatk-4.1.2.0/essai_id2.vcf
    14:54:59.916 WARN IndexUtils - Feature file "/sandbox/resources/species/human/ensembl/release-75/dbSNP_b150_GRCh37_00-All.vcf.gz" appears to contain no sequence dictionary. Attempting to retrieve a sequence dictionary from the associated index file
    14:55:01.375 INFO VariantAnnotator - Done initializing engine
    14:55:01.692 INFO ProgressMeter - Starting traversal
    14:55:01.692 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    14:55:03.393 INFO VariantAnnotator - Shutting down engine
    [May 17, 2019 2:55:03 PM CEST] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.72 minutes.
    Runtime.totalMemory()=1185939456
    java.lang.IllegalStateException: Key "BIOMART_COORDS found in VariantContext field INFO at 1:783071 but this key isn't defined in the VCFHeader. We require all VCFs to have complete VCF headers by default.
    at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:202)
    at htsjdk.variant.vcf.VCFEncoder.write(VCFEncoder.java:141)
    at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:248)
    at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.apply(VariantAnnotator.java:220)
    at org.broadinstitute.hellbender.engine.VariantWalker.lambda$traverse$0(VariantWalker.java:106)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
    at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
    at org.broadinstitute.hellbender.engine.VariantWalker.traverse(VariantWalker.java:104)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1039)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)

  • amandineldamandineld Member

    Problem solved, " broke in and the program was blocked

  • SChaluvadiSChaluvadi Member, Broadie, Moderator admin

    @dbecker Thank you for the input! @amandineld glad to hear that your issue was resolved!

Sign In or Register to comment.