Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

LiftoverVCF throws error "java.lang.ArrayIndexOutOfBoundsException: -1"

Hello,

I have been trying to to Liftover my VCF from hg37 to hg38 genome builds. However, every time I get the same error which is "java.lang.ArrayIndexOutOfBoundsException: -1". Below you can find the command I've used and the error LiftoverVcf throws. I have also fixed the header prior to running LiftoverVcf. From another post, I've read that this is a bug in the script and not necessarily something related to my data... Any help on fixing this issue will be highly appreciated..

```
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /Home/nlykosko/Software/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar LiftoverVcf --INPUT chr6.vcf.gz --OUTPUT chr6_hg38.vcf.gz --REFERENCE_SEQUENCE hg38_chr_only_and_herpes.fa --REJECT rejection_file_chr6.vcf --CHAIN hg19ToHg38.over.chain.gz --WARN_ON_MISSING_CONTIG true --VERBOSITY DEBUG --MAX_RECORDS_IN_RAM 100000
11:29:31.255 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/Home/nlykosko/Software/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
[Thu Jul 25 11:29:31 CEST 2019] LiftoverVcf --INPUT chr6.vcf.gz --OUTPUT chr6_hg38.vcf.gz --CHAIN hg19ToHg38.over.chain.gz --REJECT rejection_file_chr6.vcf --WARN_ON_MISSING_CONTIG true --VERBOSITY DEBUG --MAX_RECORDS_IN_RAM 100000 --REFERENCE_SEQUENCE hg38_chr_only_and_herpes.fa --LOG_FAILED_INTERVALS true --WRITE_ORIGINAL_POSITION false --WRITE_ORIGINAL_ALLELES false --LIFTOVER_MIN_MATCH 1.0 --ALLOW_MISSING_FIELDS_IN_HEADER false --RECOVER_SWAPPED_REF_ALT false --TAGS_TO_REVERSE AF --TAGS_TO_DROP MAX_AF --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --CREATE_INDEX false --CREATE_MD5_FILE false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Jul 25, 2019 11:29:33 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
[Thu Jul 25 11:29:33 CEST 2019] Executing as [email protected] on Linux 3.10.0-693.11.6.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_212-b04; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.1.2.0
INFO 2019-07-25 11:29:34 LiftoverVcf Loading up the target reference genome.
DEBUG 2019-07-25 11:30:02 BlockCompressedOutputStream Using deflater: IntelDeflater
INFO 2019-07-25 11:30:02 LiftoverVcf Lifting variants over and sorting (not yet writing the output file.)
INFO 2019-07-25 11:30:05 LiftOver Interval chr6:892954-892955 failed to match chain 6 because intersection length 1 < minMatchSize 2.0 (0.5 < 1.0)
INFO 2019-07-25 11:30:05 LiftOver Interval chr6:906262-906264 failed to match chain 6 because intersection length 1 < minMatchSize 3.0 (0.33333334 < 1.0)
INFO 2019-07-25 11:30:05 LiftOver Interval chr6:916025-916031 failed to match chain 6 because intersection length 3 < minMatchSize 7.0 (0.42857143 < 1.0)
INFO 2019-07-25 11:30:05 LiftOver Interval chr6:927955-927959 failed to match chain 6 because intersection length 1 < minMatchSize 5.0 (0.2 < 1.0)
[Thu Jul 25 11:30:27 CEST 2019] picard.vcf.LiftoverVcf done. Elapsed time: 0.95 minutes.
Runtime.totalMemory()=8048222208

java.lang.ArrayIndexOutOfBoundsException: -1
at picard.util.LiftoverUtils.lambda$leftAlignVariant$3(LiftoverUtils.java:376)
at java.util.stream.Collectors.lambda$groupingBy$45(Collectors.java:907)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1628)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at picard.util.LiftoverUtils.leftAlignVariant(LiftoverUtils.java:376)
at picard.util.LiftoverUtils.reverseComplementVariantContext(LiftoverUtils.java:178)
at picard.util.LiftoverUtils.liftVariant(LiftoverUtils.java:76)
at picard.vcf.LiftoverVcf.doWork(LiftoverVcf.java:396)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
at org.broadinstitute.hellbender.Main.main(Main.java:291)
```

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @NikolaosLykos

    Please post the version of GATK and the exact command you are using

  • NikolaosLykosNikolaosLykos GenevaMember
    Hi @bhanuGandham,

    I am using GATK version 4.1.2.0 and the command I am using is the following :

    ```
    picard-tools LiftoverVcf --INPUT chr6.vcf.gz --OUTPUT chr6_hg38.vcf.gz --REFERENCE_SEQUENCE hg38_chr_only_and_herpes.fa --REJECT rejection_file_chr6.vcf --CHAIN hg19ToHg38.over.chain.gz --WARN_ON_MISSING_CONTIG true --VERBOSITY DEBUG --MAX_RECORDS_IN_RAM 100000
    ```
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @NikolaosLykos

    Can you please check the validity of the input vcf files by running ValidateVariants on the input files. To perform VCF format and all strict validations please provide dbsnp as one of the inputs, but if dbsnp is not available, please run it with --validation-type-to-exclude ALL as shown in examples here: https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_variantutils_ValidateVariants.php

  • NikolaosLykosNikolaosLykos GenevaMember
    Hi @bhanuGandham

    I have checked the validity of the VCF I am trying to lift over. It seems to be fine. Here is the stdout of ValidateVariants :

    The command I've used is :

    gatk ValidateVariants \
    -R hg19_chr_only_and_herpes.fa \
    -V chr6.vcf.gz \
    --validation-type-to-exclude ALL

    ```
    Using GATK jar /Home/Software/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /Home/Software/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar ValidateVariants -R hg19_chr_only_and_herpes.fa -V chr6.vcf.gz --validation-type-to-exclude ALL
    11:10:59.780 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/Home/Software/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
    Jul 30, 2019 11:11:01 AM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
    INFO: Failed to detect whether we are running on Google Compute Engine.
    11:11:01.932 INFO ValidateVariants - ------------------------------------------------------------
    11:11:01.936 INFO ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
    11:11:01.939 INFO ValidateVariants - For support and documentation go to
    11:11:01.942 INFO ValidateVariants - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
    11:11:01.945 INFO ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_222-b10
    11:11:01.948 INFO ValidateVariants - Start Date/Time: July 30, 2019 11:10:59 AM CEST
    11:11:01.952 INFO ValidateVariants - ------------------------------------------------------------
    11:11:01.955 INFO ValidateVariants - ------------------------------------------------------------
    11:11:01.959 INFO ValidateVariants - HTSJDK Version: 2.19.0
    11:11:01.961 INFO ValidateVariants - Picard Version: 2.19.0
    11:11:01.963 INFO ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
    11:11:01.965 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    11:11:01.967 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    11:11:01.967 INFO ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    11:11:01.968 INFO ValidateVariants - Deflater: IntelDeflater
    11:11:01.969 INFO ValidateVariants - Inflater: IntelInflater
    11:11:01.969 INFO ValidateVariants - GCS max retries/reopens: 20
    11:11:01.971 INFO ValidateVariants - Requester pays: disabled
    11:11:01.971 INFO ValidateVariants - Initializing engine
    11:11:02.867 INFO FeatureManager - Using codec VCFCodec to read file file://chr6.vcf.gz
    11:11:03.058 INFO ValidateVariants - Done initializing engine
    11:11:03.059 INFO ProgressMeter - Starting traversal
    11:11:03.061 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
    11:11:13.092 INFO ProgressMeter - chr6:21824926 0.2 78000 466646.7
    11:11:23.115 INFO ProgressMeter - chr6:42940365 0.3 182000 544556.9
    11:11:33.146 INFO ProgressMeter - chr6:75513075 0.5 286000 570402.9
    11:11:43.224 INFO ProgressMeter - chr6:108888868 0.7 392000 585642.8
    11:11:53.302 INFO ProgressMeter - chr6:144861946 0.8 498000 594733.4
    11:12:02.524 INFO ProgressMeter - chr6:170678670 1.0 594905 600297.7
    11:12:02.529 INFO ProgressMeter - Traversal complete. Processed 594905 total variants in 1.0 minutes.
    11:12:02.530 INFO ValidateVariants - Shutting down engine
    [July 30, 2019 11:12:02 AM CEST] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 1.05 minutes.
    ```
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @NikolaosLykos

    It looks like when other users have had this issue, the .dict file and/or .fai file was missing for the reference. Can you confirm the FASTA file has a .dict file and .fai file in the same directory?

Sign In or Register to comment.