Attention:
The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

SplitNCigarReads java.lang.ArrayIndexOutOfBoundsException

alexbmpalexbmp Member
edited July 2018 in Ask the GATK team

Dear GATK team,

I'm using gatk4-4.0.5.1-0 on CentOS, installed through conda.
I post this question here because I could not get any answer regarding this matter online.

I'm performing gatk SplitNCigarReads on my RNA-seq sorted-bam file.
The sorted-bam file was obtained by using gatk AddOrReplaceReadGroups, using the -SO coordinate option.

Before writing the error message down, here's the script written on the log file (which both stdour and stderr were saved):

Using GATK jar /home/genomics_cf/.conda/envs/exome/share/gatk4-4.0.5.1-0/gatk-package-4.0.5.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=fals
e -Dsamjdk.compression_level=2 -Xmx4g -Djava.io.tmpdir=/home/genomics_cf/180530_NB501839_0019_AHKG5LBGX5/tmp -jar /home/genomics_cf/.
conda/envs/exome/share/gatk4-4.0.5.1-0/gatk-package-4.0.5.1-local.jar SplitNCigarReads -I /home/genomics_cf/180530_NB501839_0019_AHKG
5LBGX5/03_sort_bam/ParkK_S12.srt.bam -O /home/genomics_cf/180530_NB501839_0019_AHKG5LBGX5/03_sort_bam/ParkK_S12.mqfix.bam -R /storage
/data3/public_data/Broad_DBs/ucsc.hg19.fasta -skip-mq-transform false
15:31:09.392 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/genomics_cf/.conda/envs/exome/share/gatk4-
4.0.5.1-0/gatk-package-4.0.5.1-local.jar!/com/intel/gkl/native/libgkl_compression.so

And the below is the error message (including some normal-looking log messages; ignore the Korean alphabet plz; it's just the date/time of the message):

15:59:00.033 INFO  ProgressMeter -       chr14:50053490             27.8              66479000        2388541.4
15:59:10.034 INFO  ProgressMeter -       chr14:50320395             28.0              66895000        2389179.7
15:59:20.146 INFO  ProgressMeter -       chr14:50320427             28.2              67280000        2388554.3
15:59:20.653 INFO  SplitNCigarReads - Shutting down engine
[2018년 7월 17일 (화) 오후 3시 59분 20초] org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads done. Elapsed time: 28.19 minutes.
Runtime.totalMemory()=3288858624
java.lang.ArrayIndexOutOfBoundsException: -1
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.overhangingBasesMismatch(OverhangFixingManager.java:313)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.fixSplit(OverhangFixingManager.java:259)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.addReadGroup(OverhangFixingManager.java:209)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads.splitNCigarRead(SplitNCigarReads.java:270)
        at org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads.firstPassApply(SplitNCigarReads.java:180)
        at org.broadinstitute.hellbender.engine.TwoPassReadWalker.lambda$traverseReads$0(TwoPassReadWalker.java:62)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at org.broadinstitute.hellbender.engine.TwoPassReadWalker.traverseReads(TwoPassReadWalker.java:60)
        at org.broadinstitute.hellbender.engine.TwoPassReadWalker.traverse(TwoPassReadWalker.java:42)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:994)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)

I've never seen this kind of error. Is there any way I can fix or go around this matter?
Thank you for your support.

Best,
Seongmin

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alexbmp
    Hi Seongmin,

    I think this thread will help. Specifically, have a look at the post from mizetrav in July 2017.

    -Sheila

  • alexbmpalexbmp Member

    @Sheila
    Hi Sheila, I've tried the solutions but it does not seem to solve my problem.
    I've used the following pipeline to get the output cigar splitted bam; still I get the same problems:

    [1] STAR alignment step

    STAR --genomeDir {input.ref} --runThreadN {params.thread} 
    --readFilesIn {input.fq1} {input.fq2} 
    --readFilesCommand "gunzip -c" --outSAMtype BAM Unsorted --outSAMmapqUnique 60 
    --sjdbOverhang 100 --twopassMode Basic --limitOutSJcollapsed 1000000 
    --outFileNamePrefix {params.out_prefix} > {log} 2>&1
    

    [2] GATK AddOrReplaceReadGroups step

    gatk --java-options '-Xmx4g -Djava.io.tmpdir={input.tmpdir}' 
    AddOrReplaceReadGroups -I {input.bam} -O {output} --CREATE_INDEX=true 
    -PL illumina -LB {params.id} -PU {params.id} -SM {params.id} -SO coordinate >> {log} 2>&1
    

    [3] GATK SplitNCigarReads step

    gatk --java-options '-Xmx4g -Djava.io.tmpdir={input.tmpdir}' 
    SplitNCigarReads -I {input.bam} -O {output} 
    -R {input.ref} >> {log} 2>&1
    

    When I've checked the bam file after step [2] using ValidateSamFile as follows,

    java -Xmx4g -jar /packages/picard-2.9.0.jar ValidateSamFile I=/home/genomics_cf/180530_NB501839_0019_AHKG5LBGX5/03_sort_bam/NhoY_S6.srt.bam IGNORE_WARNINGS=true MODE=VERBOSE
    

    the output is a single simple line stating No errors found.

    The log message after step [3] is still:

    14:20:22.863 INFO  ProgressMeter -       chr14:50053475             47.8              72923000        1524627.3
    14:20:32.862 INFO  ProgressMeter -       chr14:50053490             48.0              73361000        1528458.7
    14:20:42.871 INFO  ProgressMeter -       chr14:50320396             48.2              73702000        1530245.4
    14:20:52.036 INFO  SplitNCigarReads - Shutting down engine
    [2018년 7월 25일 (수) 오후 2시 20분 52초] org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads done. Elapsed time: 48.34 minutes.
    Runtime.totalMemory()=2238185472
    java.lang.ArrayIndexOutOfBoundsException: -3
            at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.overhangingBasesMismatch(OverhangFixingManager.java:313)
            at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.fixSplit(OverhangFixingManager.java:259)
            at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.addReadGroup(OverhangFixingManager.java:209)
            at org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads.splitNCigarRead(SplitNCigarReads.java:270)
            at org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads.firstPassApply(SplitNCigarReads.java:180)
            at org.broadinstitute.hellbender.engine.TwoPassReadWalker.lambda$traverseReads$0(TwoPassReadWalker.java:62)
            at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
            at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
            at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
            at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
            at java.util.Iterator.forEachRemaining(Iterator.java:116)
            at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
            at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
            at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
            at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
            at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
            at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
            at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
            at org.broadinstitute.hellbender.engine.TwoPassReadWalker.traverseReads(TwoPassReadWalker.java:60)
            at org.broadinstitute.hellbender.engine.TwoPassReadWalker.traverse(TwoPassReadWalker.java:42)
            at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:984)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:135)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
            at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
            at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
            at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
            at org.broadinstitute.hellbender.Main.main(Main.java:289)
    Using GATK jar /home/genomics_cf/.conda/envs/rainbow/share/gatk4-4.0.6.0-0/gatk-package-4.0.6.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx4g -Djava.io.tmpdir=/home/genomics_cf/180530_NB501839_0019_AHKG5LBGX5/tmp -jar /home/genomics_cf/.conda/envs/rainbow/share/gatk4-4.0.6.0-0/gatk-package-4.0.6.0-local.jar SplitNCigarReads -I /home/genomics_cf/180530_NB501839_0019_AHKG5LBGX5/03_sort_bam/NhoY_S6.srt.bam -O /home/genomics_cf/180530_NB501839_0019_AHKG5LBGX5/03_sort_bam/NhoY_S6.mqfix.bam -R /storage/data3/public_data/Broad_DBs/ucsc.hg19.fasta
    

    Probably I'm doing something wrong here, but I can't figure out what it is.
    Could you please lend me some help about this matter?
    If you need some other log or output I'd be happy to provide them.

    Best,
    Seongmin

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alexbmp
    Hi Seongmin,

    It looks like this may be related to this issue. Let me check and get back to you.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alexbmp
    Hi Seongmin,

    Can you submit a bug report so we can look into this locally? Instructions are here.

    Thanks,
    Sheila

  • alexbmpalexbmp Member
    edited August 2018

    @Sheila
    Dear Sheila,
    I've submitted the bug report following the instructions you've provided me.
    The archive file name is genomics_cf.toBroad.tar.gz.
    It includes the whole BAM (the snippet does not reproduce the bug), BAI, a bug-reporting script, and the corresponding log file.
    Thank you :smile:
    Seongmin

    Issue · Github
    by Sheila

    Issue Number
    3146
    State
    open
    Last Updated
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alexbmp
    Hi Seongmin,

    Great, thanks. I will have a look soon.

    -Sheila

Sign In or Register to comment.