Attention:
The frontline support team will be slow on the forum because we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and more available to answer questions on the forum on March 25th 2019.

GATK4 ReferenceBase Annotation Error: String index out of range

We haven't transitioned our pipeline for GATK4, so my first GATK4 (4.0.3.0) command is to apply the ReferenceBase annotation to VCFs that we previously generated.

So the invocation was: ./gatk VariantAnnotator -R /scratch/lym_myl_rsch/mma/Green_index/ucsc_hg19.fa -V "/scratch/lym_myl_rsch/mma/Green/Biovest/output/806696_B/806696_B.UnifiedGenotyper.vcf" -A ReferenceBases -O "/scratch/lym_myl_rsch/mma/Green/Biovest/output/806696_B/806696_B.UnifiedGenotyper.refBases.vcf" where the reference is UCSC version of hg19 and the input VCF is one that comes from GATK3.7.0 UnifiedGenotyper.

I got the following error:

Using GATK jar /scratch/lym_myl_rsch/mma/gatk-4.0.3.0/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scratch/lym_myl_rsch/mma/gatk-4.0.3.0/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar VariantAnnotator -R /scratch/lym_myl_rsch/mma/Green_index/ucsc_hg19.fa -V /scratch/lym_myl_rsch/mma/Green/Biovest/output/806696_B/806696_B.UnifiedGenotyper.vcf -A ReferenceBases -O /scratch/lym_myl_rsch/mma/Green/Biovest/output/806696_B/806696_B.UnifiedGenotyper.refBases.vcf
13:33:03.966 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scratch/lym_myl_rsch/mma/gatk-4.0.3.0/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:33:15.280 INFO  VariantAnnotator - ------------------------------------------------------------
13:33:15.281 INFO  VariantAnnotator - The Genome Analysis Toolkit (GATK) v4.0.3.0
13:33:15.281 INFO  VariantAnnotator - For support and documentation go to https://software.broadinstitute.org/gatk/
13:33:15.293 INFO  VariantAnnotator - Executing as [email protected] on Linux v2.6.32-431.23.3.el6.x86_64 amd64
13:33:15.293 INFO  VariantAnnotator - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_45-b14
13:33:15.294 INFO  VariantAnnotator - Start Date/Time: April 17, 2018 1:33:03 PM CDT
13:33:15.294 INFO  VariantAnnotator - ------------------------------------------------------------
13:33:15.294 INFO  VariantAnnotator - ------------------------------------------------------------
13:33:15.295 INFO  VariantAnnotator - HTSJDK Version: 2.14.3
13:33:15.295 INFO  VariantAnnotator - Picard Version: 2.17.2
13:33:15.295 INFO  VariantAnnotator - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:33:15.295 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:33:15.295 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:33:15.295 INFO  VariantAnnotator - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:33:15.295 INFO  VariantAnnotator - Deflater: IntelDeflater
13:33:15.295 INFO  VariantAnnotator - Inflater: IntelInflater
13:33:15.296 INFO  VariantAnnotator - GCS max retries/reopens: 20
13:33:15.296 INFO  VariantAnnotator - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
13:33:15.296 WARN  VariantAnnotator -

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

   Warning: VariantAnnotator is a BETA tool and is not yet ready for use in production

   !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


13:33:15.296 INFO  VariantAnnotator - Initializing engine
13:33:16.231 INFO  FeatureManager - Using codec VCFCodec to read file file:///scratch/lym_myl_rsch/mma/Green/Biovest/output/806696_B/806696_B.UnifiedGenotyper.vcf
13:33:16.308 INFO  VariantAnnotator - Done initializing engine
13:33:17.955 INFO  ProgressMeter - Starting traversal
13:33:17.956 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
13:33:18.087 INFO  VariantAnnotator - Shutting down engine
[April 17, 2018 1:33:18 PM CDT] org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator done. Elapsed time: 0.24 minutes.
Runtime.totalMemory()=766509056
java.lang.StringIndexOutOfBoundsException: String index out of range: 20
        at java.lang.String.substring(String.java:1951)
        at org.broadinstitute.hellbender.tools.walkers.annotator.ReferenceBases.annotate(ReferenceBases.java:47)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.annotateContext(VariantAnnotatorEngine.java:377)
        at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotator.apply(VariantAnnotator.java:233)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase.lambda$traverse$0(VariantWalkerBase.java:110)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase$$Lambda$76/1861616277.accept(Unknown Source)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:512)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at org.broadinstitute.hellbender.engine.VariantWalkerBase.traverse(VariantWalkerBase.java:108)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:893)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

The output file named at -O was created, but only the VCF headers were written.

The VCF itself is a single-sample VCF that's the output of UnifiedGenotyper as packaged in GATK 3.7.0. ValidateVariants in GATK 4.0.3.0 did not report any issues with this file:

./gatk ValidateVariants -V "/scratch/lym_myl_rsch/mma/Green/Biovest/output/806696_B/806696_B.UnifiedGenotyper.vcf" -R ~/Green_index/ucsc_hg19.fa
Using GATK jar /scratch/lym_myl_rsch/mma/gatk-4.0.3.0/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /scratch/lym_myl_rsch/mma/gatk-4.0.3.0/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar ValidateVariants -V /scratch/lym_myl_rsch/mma/Green/Biovest/output/806696_B/806696_B.UnifiedGenotyper.vcf -R /scratch/lym_myl_rsch/mma/Green_index/ucsc_hg19.fa
13:47:35.015 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/scratch/lym_myl_rsch/mma/gatk-4.0.3.0/gatk-4.0.3.0/gatk-package-4.0.3.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
13:47:46.294 INFO  ValidateVariants - ------------------------------------------------------------
13:47:46.294 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.0.3.0
13:47:46.294 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
13:47:46.305 INFO  ValidateVariants - Executing as [email protected] on Linux v2.6.32-431.23.3.el6.x86_64 amd64
13:47:46.305 INFO  ValidateVariants - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_45-b14
13:47:46.306 INFO  ValidateVariants - Start Date/Time: April 17, 2018 1:47:34 PM CDT
13:47:46.306 INFO  ValidateVariants - ------------------------------------------------------------
13:47:46.306 INFO  ValidateVariants - ------------------------------------------------------------
13:47:46.306 INFO  ValidateVariants - HTSJDK Version: 2.14.3
13:47:46.307 INFO  ValidateVariants - Picard Version: 2.17.2
13:47:46.307 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
13:47:46.307 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
13:47:46.307 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
13:47:46.307 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
13:47:46.307 INFO  ValidateVariants - Deflater: IntelDeflater
13:47:46.307 INFO  ValidateVariants - Inflater: IntelInflater
13:47:46.307 INFO  ValidateVariants - GCS max retries/reopens: 20
13:47:46.307 INFO  ValidateVariants - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
13:47:46.308 INFO  ValidateVariants - Initializing engine
13:47:47.282 INFO  FeatureManager - Using codec VCFCodec to read file file:///scratch/lym_myl_rsch/mma/Green/Biovest/output/806696_B/806696_B.UnifiedGenotyper.vcf
13:47:47.347 INFO  ValidateVariants - Done initializing engine
13:47:47.347 INFO  ProgressMeter - Starting traversal
13:47:47.348 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
13:47:58.793 INFO  ProgressMeter -      chr14:106805508              0.2                  2000          10485.8
13:48:00.619 INFO  ProgressMeter -       chr22:22730534              0.2                  3174          14350.1
13:48:00.619 INFO  ProgressMeter - Traversal complete. Processed 3174 total variants in 0.2 minutes.
13:48:00.619 INFO  ValidateVariants - Shutting down engine
[April 17, 2018 1:48:00 PM CDT] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.43 minutes.
Runtime.totalMemory()=711458816

Issue · Github
by Sheila

Issue Number
3078
State
open
Last Updated
Assignee
Array

Answers

Sign In or Register to comment.