We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

gatk CombineGVCFs, java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 256

Hi,

Please see my GATK log file below, I try to use "gatk CombineGVCFs" to merge 3 GVCF files into one GVCF file.

There is an error message of "java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 256".

Therefore, my output all.g.vcf file was generated but it has 0 variant.

Can you please let me know what does error message mean, and how to resolve it?

Thank you & best regards,
Jie

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @jiehuang001

    Please post the version of GATK you are using, the exact command and the entire stack trace. The image of the stack trace you shared is not clear/viewable.

  • gatk --java-options "-Xmx6g" CombineGVCFs -R $ref -O all.g.vcf --variant gvcf.list
    Using GATK jar /mnt/d/software_lin/gatk/gatk-package-4.1.4.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx6g -jar /mnt/d/software_lin/gatk/gatk-package-4.1.4.0-local.jar CombineGVCFs -R /mnt/d/data/gatk_bundle/hg38//Homo_sapiens_assembly38.fasta.gz -O all.g.vcf --variant gvcf.list
    22:49:52.449 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/d/software_lin/gatk/gatk-package-4.1.4.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Dec 19, 2019 10:49:53 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    22:49:53.492 INFO CombineGVCFs - ------------------------------------------------------------
    22:49:53.494 INFO CombineGVCFs - The Genome Analysis Toolkit (GATK) v4.1.4.0
    22:49:53.494 INFO CombineGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    22:49:53.495 INFO CombineGVCFs - Executing as [email protected] on Linux v4.4.0-17763-Microsoft amd64
    22:49:53.495 INFO CombineGVCFs - Java runtime: OpenJDK 64-Bit Server VM v11.0.4+11-post-Ubuntu-1ubuntu218.04.3
    22:49:53.496 INFO CombineGVCFs - Start Date/Time: December 19, 2019 at 10:49:52 PM GMT
    22:49:53.496 INFO CombineGVCFs - ------------------------------------------------------------
    22:49:53.497 INFO CombineGVCFs - ------------------------------------------------------------
    22:49:53.498 INFO CombineGVCFs - HTSJDK Version: 2.20.3
    22:49:53.502 INFO CombineGVCFs - Picard Version: 2.21.1
    22:49:53.502 INFO CombineGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    22:49:53.523 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    22:49:53.524 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    22:49:53.550 INFO CombineGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    22:49:53.550 INFO CombineGVCFs - Deflater: IntelDeflater
    22:49:53.570 INFO CombineGVCFs - Inflater: IntelInflater
    22:49:53.571 INFO CombineGVCFs - GCS max retries/reopens: 20
    22:49:53.595 INFO CombineGVCFs - Requester pays: disabled
    22:49:53.597 INFO CombineGVCFs - Initializing engine
    22:49:53.869 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/projects/sequencing/gvcf/OPG0005F/OPG0005F.GATK.var.g.vcf
    22:49:53.923 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/projects/sequencing/gvcf/OPG0005M/OPG0005M.GATK.var.g.vcf
    22:49:53.939 INFO FeatureManager - Using codec VCFCodec to read file file:///mnt/d/projects/sequencing/gvcf/OPG0005P/OPG0005P.GATK.var.g.vcf
    22:49:54.132 INFO CombineGVCFs - Done initializing engine
    22:49:54.156 INFO ProgressMeter - Starting traversal
    22:49:54.157 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    22:49:54.193 INFO CombineGVCFs - Shutting down engine
    [December 19, 2019 at 10:49:54 PM GMT] org.broadinstitute.hellbender.tools.walkers.CombineGVCFs done. Elapsed time: 0.03 minutes.
    Runtime.totalMemory()=156237824
    java.lang.ArrayIndexOutOfBoundsException: Index -2 out of bounds for length 256
    at org.broadinstitute.hellbender.utils.BaseUtils.convertIUPACtoN(BaseUtils.java:120)
    at org.broadinstitute.hellbender.utils.fasta.CachingIndexedFastaSequenceFile.getSubsequenceAt(CachingIndexedFastaSequenceFile.java:326)
    at org.broadinstitute.hellbender.engine.ReferenceFileSource.queryAndPrefetch(ReferenceFileSource.java:78)
    at org.broadinstitute.hellbender.engine.ReferenceDataSource.queryAndPrefetch(ReferenceDataSource.java:64)
    at org.broadinstitute.hellbender.engine.ReferenceContext.getBases(ReferenceContext.java:197)
    at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.createIntermediateVariants(CombineGVCFs.java:216)
    at org.broadinstitute.hellbender.tools.walkers.CombineGVCFs.apply(CombineGVCFs.java:162)
    at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:131)
    at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.apply(MultiVariantWalkerGroupedOnStart.java:106)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.lambda$traverse$1(MultiVariantWalker.java:120)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
    at org.broadinstitute.hellbender.engine.MultiVariantWalker.traverse(MultiVariantWalker.java:118)
    at org.broadinstitute.hellbender.engine.MultiVariantWalkerGroupedOnStart.traverse(MultiVariantWalkerGroupedOnStart.java:163)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)

    Issue · Github
    by bhanuGandham

    Issue Number
    6340
    State
    open
    Last Updated
    Assignee
    Array
    Milestone
    Array
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited December 2019

    HI @jiehuang001

    This looks like a potential bug on our end. Can you please provide a your input gvcf files so we can reproduce the error on our end and create a bug report for it. this will help us fix the issue. Here are the instructions to share your data with us: https://software.broadinstitute.org/gatk/guide/article?id=1894
    Looks like the issue is with these three files OPG0005F.GATK.var.g.vcf OPG0005M.GATK.var.g.vcf OPG0005P.GATK.var.g.vcf according to the stacktrace. If you are unable to send us entire files, at least send us snippets with the beginning few records of variants since the error is somewhere in the beginning of your files.

  • Hi,

    I just put the log file and the first few lines of the GVCF file into a zip file. And here is the dropbox link to download this small jiehuang.zip file https://www.dropbox.com/s/p2smhpm5btsf2cq/jiehuang.zip?dl=0, which is only 17Kb.

    BTW, I now encountered another error, shown below, when i run this command: gatk --java-options "-Xmx6g" GenotypeGVCFs -R Homo_sapiens_assembly38.fasta.gz -O all.vcf.gz --variant mine.gvcf.gz --allow-old-rms-mapping-quality-annotation-data --dbsnp dbsnp_138.hg38.vcf.gz

    A USER ERROR has occurred: The list of input alleles must contain as an allele but that is not the case at position 10469; please use the Haplotype Caller with gVCF output to generate appropriate records

    Below is the screenshot of a few lines from mine.gvcf.gz:

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @jiehuang001

    1. I have created a issue ticket for the dev team for the ArrayIndexOutOfBoundsException and you can track its progress here: https://github.com/broadinstitute/gatk/issues/6340

    2. For the most recent error you are seeing, I suspect you are using a vcf instead of a gvcf file and hence the error. Please verify mine.gvcf.gz is a gvcf by checking the file header. Take a look at this gvcf doc for more info: https://software.broadinstitute.org/gatk/documentation/article?id=11004 Also the convention is the use .g.vcf suffix for gvcf files. Some GATK tools require that suffix.

Sign In or Register to comment.