ERROR stack trace java.lang.ArrayIndexOutOfBoundsException: -4

ngonngon Chicago, ILMember

I received the following error message while running GATK v.3.3-0 BaseRecalibrator on a subset of my data. Here is some information which might be relevant:

  • I ran BaseRecalibrator on a list of 518 samples sequenced on 4 Illumina flow cells.
  • Each lane/RG contains up to 24 samples which were demuxed before being handed off to me (there is a separate bam file for each sample).
  • Libraries were prepared using the genotyping-by-sequencing (GBS) method (DNA was fragmented via restriction digestion).
  • Prior to this step, I realigned each sample around known indels and used Picard to change the read groups to look like this: @RG ID:D7LYMFP1:175:4 PU:D7LYMFP1:175:4 LB:gtatt SM:22877 PL:ILLUMINA
  • I have requested that my sysadmin install GATK 3.4-46.

Could it have something to do with the flowcells that were included in this group? Two of them were sequenced on a different Illumina machine. Positive controls for these 2 flow cells have hyphens in the file names (e.g. 26501-LG.bam). I ran BaseRecalibrator using the same commands on a different set of 4 flow cells (all of which were sequenced on the same machine, and no file names have hyphens or other weird characters), and it appears to have worked although it timed out before I could get the results (the estimated time for the analysis was 49.7w!!!).

Your help is appreciated.

#

Program Args: -T BaseRecalibrator -R:REFSEQ mm10.fasta -I bqsr.1.list -knownSites:VCF LGSM.mm10.orderedN.vcf -o bqsr.1.table
...
INFO 11:26:43,287 GenomeAnalysisEngine - Strictness is SILENT
INFO 11:26:43,761 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 11:26:43,786 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
...
INFO 11:26:55,137 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 11.35
INFO 11:27:05,687 GenomeAnalysisEngine - Preparing for traversal over 518 BAM files
INFO 11:27:05,793 GenomeAnalysisEngine - Done preparing for traversal
INFO 11:27:05,796 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 11:27:05,797 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 11:27:05,809 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
INFO 11:27:05,979 BaseRecalibrator - The covariates being used here:
INFO 11:27:05,980 BaseRecalibrator - ReadGroupCovariate
INFO 11:27:05,982 BaseRecalibrator - QualityScoreCovariate
INFO 11:27:05,983 BaseRecalibrator - ContextCovariate
INFO 11:27:05,984 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 11:27:05,985 BaseRecalibrator - CycleCovariate
INFO 11:27:06,020 ReadShardBalancer$1 - Loading BAM index data
INFO 11:27:06,507 ReadShardBalancer$1 - Done loading BAM index data
INFO 11:27:28,334 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -4
at org.broadinstitute.gatk.utils.baq.BAQ.calcEpsilon(BAQ.java:185)
at org.broadinstitute.gatk.utils.baq.BAQ.hmm_glocal(BAQ.java:272)
at org.broadinstitute.gatk.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:553)
at org.broadinstitute.gatk.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:610)
at org.broadinstitute.gatk.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:536)
at org.broadinstitute.gatk.utils.baq.BAQ.baqRead(BAQ.java:680)
at org.broadinstitute.gatk.tools.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:486)
at org.broadinstitute.gatk.tools.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:262)
at org.broadinstitute.gatk.tools.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:135)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:216)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274)
at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102)
at org.broadinstitute.gatk.engine.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:107)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.3-0-g37228af):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: -4
ERROR ------------------------------------------------------------------------------------------

Done running BQSR.

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @ngon
    Hi,

    Let me know if the error persists with the latest version of GATK.

    Thanks,
    Sheila

  • ngonngon Chicago, ILMember

    @Sheila,
    I submitted each flow cell as a separate array of jobs, and it ran without any problems (was probably faster as well). I'll update this post again if my system admin installs the latest version of GATK on our cluster in time for me to re-run the pipeline.

  • ngonngon Chicago, ILMember

    @Sheila
    Hi - I tried running BQSR using GATK 3.4-46 this afternoon. I got the same error.
    ctrl.bqsr.list contains a list of 85 bam files; as in my previous post, there are several unique RGs present.

    Here are the arguments I used:

    -T BaseRecalibrator
    -R:REFSEQ mm10.fasta
    -I ctrl.bqsr.list
    -knownSites:VCF mm10.sortedSNPs.vcf
    -knownSites:VCF mm10.sortedIndels.vcf
    -o ctrl.bqsr.table

    As far as I can tell, the error message is identical to the one I posted before, so I won't paste it here unless requested to.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @ngon
    Hi,

    What version of Java are you using?

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @ngon
    Hi again,

    This thread may help too, as there may be something wrong with one of your bam files: http://gatkforums.broadinstitute.org/discussion/4998/problem-with-baserecalibrator

    -Sheila

  • ngonngon Chicago, ILMember

    Thanks @Sheila - I'll look into it. I'm using java/jdk1.8.0_45.

  • barbaraluisabarbaraluisa BrazilMember

    Hi,

    I found a similar error, but this was found when I was using GATK to calculate the depth of coverage of specific regions in human genome, as the following command:

    java -jar $GATK -R $hg37Fas -T DepthOfCoverage -L 2:47630325-47630582 -L 2:47635522-47635698 -L 2:47637213-47637523 -L 2:47639529-47639705 -L 2:47641386-47641586 -L 2:47643415-47643588 -L 2:47656870-47657106 -L 2:47672669-47672801 -L 2:47690166-47690297 -L 2:47693792-47693952 -L 2:47698101-47698219 -L 2:47702164-47702430 -L 2:47703501-47703724 -L 2:47705421-47705674 -L 2:47707814-47708016 -L 2:47709904-47710103 -L 2:48010354-48010790 -L 2:48010329-48010760 -L 2:48017999-48018272 -L 2:48023049-48023230 -L 2:48025708-48026457 -L 2:48026445-48027064 -L 2:48026984-48027435 -L 2:48027426-48028332 -L 2:48030450-48030921 -L 2:48032015-48032210 -L 2:48032640-48032961 -L 2:48033307-48033476 -L 2:48033485-48033916 -L 2:48033929-48034054 -L 3:37034992-37035188 -L 3:37038047-37038236 -L 3:37042399-37042580 -L 3:37045840-37046023 -L 3:37048440-37048582 -L 3:37050246-37050416 -L 3:37053268-37053416 -L 3:37053459-37053633 -L 3:37055888-37056047 -L 3:37058850-37059109 -L 3:37061743-37061983 -L 3:37067076-37067549 -L 3:37070209-37070457 -L 3:37081622-37081832 -L 3:37083699-37083862 -L 3:37088955-37089234 -L 3:37089942-37090131 -L 3:37090393-37090567 -L 3:37091936-37092260 -L 7:6011268-6029558 -L 7:6029417-6029783 -L 7:6029478-6038867 -L 7:6038687-6038999 -L 7:6038786-6048724 -o file -I file.bam -geneList file.dict;

    However, instead of -4, my error was 4 as showing below:

    Any recommendation on how to solve this issue ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Try validating your input bam file with ValidateSamFile in summary mode to see if there is anything wrong with it. Also, try running with the latest version of GATK to see if the error still occurs. If this is caused by a bug it has probably been fixed already.

  • barbaraluisabarbaraluisa BrazilMember
    edited June 2016

    Hi @Geraldine_VdAuwera,

    I tried to run in the latest version of GATK and the error still occurs. So, I ValidateSamFIle and I could saw a MISMATCH_FLAG_MATE_NEG_STRAND. I also tried to map my sequences using a reference sequence from another database, and the error is the same. Any recommendation on how to fix my sam file?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Try running Picard FixMateInformation on your file, that might help.
  • barbaraluisabarbaraluisa BrazilMember

    Thanks for you suggestion @Geraldine_VdAuwera.
    I tried FixMateInformation now, but unfortunately the same error still occurs at DepthofCoverage step. I also had problems with the .bam files that are generated by .sam files. Is it necessary to do FixMateInformation for all wrong files or I can do it just for the first generated file?
    Do you think that unfixed files can influence the Haplotype Caller analysis? Because I could't find any error message for variant calling step.
    Thanks a lot.

Sign In or Register to comment.