BaseRecalibrator Error

michelmichel Posts: 9Member
edited October 2012 in Ask the GATK team

Dear GATK-Team

When running BaseRecalibrator with own selected SNPs, i got following stack trace error:

ERROR stack trace

java.lang.ArrayIndexOutOfBoundsException: -1 at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.isLowQualityBase(BaseRecalibrator.java:205) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:228) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.1-6-g6a46042):

Do i have to filter out all LowQuality-SNPs in the known.vcf file before feeding it into the BaseRecalibrator or is this a software error? Thank you! Michel

Best Answer

  • rpoplinrpoplin Posts: 122GATK Developer mod
    Answer ✓

    Unfortunately I wasn't able to run the GATK on your sam file (I think it is missing its header?), but I was able to fix another problem in the BaseRecalibrator related to your reads. Hopefully this will fix your issue. Version 2.1-13 should appear on the website later today.

    Thanks for all your help with this,

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,641Administrator, GATK Developer admin

    You should definitely not be feeding low-quality SNPs as knowns to the BaseRecalibrator, that will lead to potentially bad recalibration results.

    However that may not be the cause of your problem here -- this looks like a bug that has been fixed. Can you please upgrade to the latest version (2.1-11) and try again?

    Geraldine Van der Auwera, PhD

  • michelmichel Posts: 9Member

    Thank you for the fast answer, I changed to the latest version, but the error unfortunately remained the same.

    ERROR stack trace

    java.lang.ArrayIndexOutOfBoundsException: -1 at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.isLowQualityBase(BaseRecalibrator.java:205) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:228) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-11-g13c0244):
    ERROR

    Do you have any other suggestions?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,641Administrator, GATK Developer admin

    OK, we've looked into it -- it's a new bug and we think we have a solution. Can you send us an excerpt from your file that contains the offending region so we can test our fix?

    Geraldine Van der Auwera, PhD

  • michelmichel Posts: 9Member
    edited October 2012

    Hi, i am not sure if i got the right region, because i dont know exactly where GATK had the error..

    INFO  17:24:05,103 TraversalEngine - Peaxi.v0.1.1.Scf2073:7571
    2.04e+07    2.0 h        6.0 m     23.3%         8.8 h     6.7 h
    INFO  17:24:40,247 TraversalEngine - Peaxi.v0.1.1.Scf2104:33179
    2.05e+07    2.0 h        6.0 m     23.3%         8.8 h     6.7 h
    INFO  17:25:11,322 TraversalEngine - Peaxi.v0.1.1.Scf2113:75472
    2.05e+07    2.1 h        6.0 m     23.4%         8.8 h     6.7 h
    INFO  17:25:30,527 GATKRunReport - Uploaded run statistics report to AWS
    S3
    

    Output ended at Scf2113 but i assume the error to be in the following scaffs. I created a sam file of the scaffolds 2113 to 2123 using samtools view. Hope that`s what you are looking for. Thank you for testing!

    Post edited by Geraldine_VdAuwera on
  • michelmichel Posts: 9Member

    Sorry, couldnt upload the file via this homepage, so here is the dropbox-link to the file: https://www.dropbox.com/s/ctuenzjnbcc6sga/realigned-bwa-error-region.sam

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,641Administrator, GATK Developer admin

    Hi Michel, can you also upload your reference file?

    Geraldine Van der Auwera, PhD

  • rpoplinrpoplin Posts: 122GATK Developer mod

    Hi Michel,

    We believe this is fixed in the latest version of the GATK available on the website. Thank you for providing the files to help us track this down.

    Cheers,

  • michelmichel Posts: 9Member

    Thank you for your effort! I rerun the files with the newest version. The error persists in the one file i sent you (same error, same place):

    ERROR stack trace

    java.lang.ArrayIndexOutOfBoundsException: -1 at org.broadinstitute.sting.utils.recalibration.ReadCovariates.getKeySet(ReadCovariates.java:31) at org.broadinstitute.sting.gatk.walkers.bqsr.AdvancedRecalibrationEngine.updateDataForPileupElement(AdvancedRecalibrationEngine.java:71) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:244) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.1-12-ga99c19d):

    but 3 other alignment-files worked well. So i am starting to suggest its a error due to the file, not the program. Unfortunately, i need the error-ed file to create a reference-SNP-set. Do you have other suggestions for the cause of the error?

    Cheers

  • rpoplinrpoplin Posts: 122GATK Developer mod

    That's actually a different error. We'll take a look at it.

    Thanks,

  • michelmichel Posts: 9Member

    BaseRecalibration on the files worked perfectly with version 2.1.13. That is some really good software support you do here! Kudos to you

  • rpoplinrpoplin Posts: 122GATK Developer mod

    Thanks! I'm glad we were able to help. Cheers,

  • ymwymw Posts: 9Member

    Hi,

    I encounter a similar question as what mentioned in this thread. But I cann't find a solution in this thread. When use BaseRecalibrator based on a vcf file, produced from my own bam file, it showed a "stack trace" error (full error message shown blow). Since my study species is non-model species, so I do not have known SNP site data, and thus have to repeatedly do UnifiedGenotyper-BaseRecaibartor-PrintReads from the original bam file; it was fine when I did the same BaseRecalibrator at the first round but failed at the second round. The GATK version is 2.3-0. I encountered the same problem in two different bam files.

    $ java -Xmx30g -jar GenomeAnalysisTK.jar -R wholegenome.fa -I bamboo120Grecal.bam -T BaseRecalibrator -cov CycleCovariate -cov ContextCovariate -knownSites bamboo120GrecalVariant.vcf -nct 10 -o bamboo120Grecal2.grp --fix_misencoded_quality_scores -fixMisencodedQuals

    error message occur:

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.ArrayIndexOutOfBoundsException: -4 at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158) at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:225) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595) at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530) at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateBAQArray(BaseRecalibrator.java:428) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:243) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:112) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:203) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:191) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler$MapReduceJob.run(NanoScheduler.java:468) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.3-0-g9593e74):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: -4
    ERROR ------------------------------------------------------------------------------------------

    Is there a solution avialable now?

    Thanks

  • ebanksebanks Posts: 684GATK Developer mod

    Hi ymw,

    Thanks for reporting this. The problem seems to be that you either have: 1) a mixture of well-encoded and mis-encoded reads in your file, or 2) base qualities that are extremely poorly calibrated and that span too large a range. I will add a patch (that will be available in version 2.4) that exits more gracefully with a better error message, but it's not going to help you unfortunately. You need to go back and fix this at the source because there's just something wrong with your data. Good luck and sorry to be the bearer of bad news.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • ymwymw Posts: 9Member

    Hi Eric,

    I figure out the problem, and maybe other users will be interested to know. The problem is that I mixed two versions of GATK for the analyses of this data set. I used GATK 2.1 to do local alignment and GATK 2.3 (when it's available) to do base quality recalibaration. When I re-do the anaylses all with GATK 2.3, the problem is solved. Best,

    Chih-Ming

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,641Administrator, GATK Developer admin
    edited January 2013

    Thanks for reporting your solution!

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • priesgopriesgo Posts: 19Member

    Hi there,

    I'm getting a similar error using the base recalibrator on 1000 Genomes SOLiD data, but in my case I'm processing all of it with the same version of GATK 2.3. My command is:

    java -Xmx15g -jar ~/software/GenomeAnalysisTKLite-2.3-4-gb8f1308/GenomeAnalysisTKLite.jar -T BaseRecalibrator -l INFO -R human_g1k_v37.fasta --knownSites 00-All-build135.vcf -I NA12814.mapped.SOLID.bfast.CEU.realigned.bam -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate --out NA12814.recalibration.grp -solid_nocall_strategy LEAVE_READ_UNRECALIBRATED --disable_indel_quals -nct 8 --filter_mismatching_base_and_quals
    

    On the other hand I'm getting a different index out of range, not sure if that gives you any info:

    java.lang.ArrayIndexOutOfBoundsException: -6
        at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158)
        at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:225)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530)
        at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663)
    

    I have observed in the 1000 genomes supplementary information that GATK was only employed to detect variants on Illumina data. Is it just a coincidence or did you have any issues with g1k SOLiD data?

    Thanks in advance! Pablo.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,641Administrator, GATK Developer admin

    Hi Pablo, could you please upload a BAM snippet for us to test?

    Instructions here if needed: http://www.broadinstitute.org/gatk/guide/article?id=1894

    Geraldine Van der Auwera, PhD

  • priesgopriesgo Posts: 19Member

    OK, I took a snippet for chromosome 22 and I could reproduce the same error (well the index out of bounds now was 11...). Everything is uploaded in ftp.broadinstitute.org/priesgo.NA12842.22.zip

    By the way I was to able to process this very same data with the old GATK 1.6.5.

    Thanks Geraldine! Pablo.

  • ebanksebanks Posts: 684GATK Developer mod

    @priesgo: I'm not sure where you got this BAM file but it is completely invalid and malformed. In the future, please run Picard's ValidateSAMFile first on your bams before sending us bug reports.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • priesgopriesgo Posts: 19Member

    As you say running Picard's ValidateSAMFile gives out an error:

    Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once.  -1: SRR097794.39113769
    

    This is because the aligner may report more than one possible mapping position for each read. But, is this incorrect? This data comes from the 1000 Genomes Project and to discard that my manipulations could add any error I just reproduced the error again with the raw data. Is there any other malformation?

    This is the command now, I had to add --fix-misencoded-quality-scores:

    java -Xmx15g -jar ~/software/GenomeAnalysisTKLite-2.3-4-gb8f1308/GenomeAnalysisTKLite.jar -T BaseRecalibrator -l INFO -R human_g1k_v37.fasta --knownSites 00-All-build135.vcf -I NA12814.mapped.SOLID.bfast.CEU.22.bam -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov ContextCovariate --out NA12814.recalibration.grp -solid_nocall_strategy LEAVE_READ_UNRECALIBRATED --disable_indel_quals -nct 8 --filter_mismatching_base_and_quals --fix_misencoded_quality_scores
    

    This is the output:

    java.lang.ArrayIndexOutOfBoundsException: -8
        at org.broadinstitute.sting.utils.baq.BAQ.calcEpsilon(BAQ.java:158)
        at org.broadinstitute.sting.utils.baq.BAQ.hmm_glocal(BAQ.java:225)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:542)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:595)
        at org.broadinstitute.sting.utils.baq.BAQ.calcBAQFromHMM(BAQ.java:530)
        at org.broadinstitute.sting.utils.baq.BAQ.baqRead(BAQ.java:663)
    

    The data comes from the 1000 Genomes Project repository and I just selected the chromosome 22 to deal with a smaller file.

  • ebanksebanks Posts: 684GATK Developer mod

    You should go back to the 1000 Genomes Project then and make sure you are pulling down the correct file, because all of the base qualities were mis-encoded in the file you uploaded to us. The minimum value is ASCII33 but you had values that were lower than that. At this point, the problem is not with the GATK so there's really nothing else we can do to help here. Good luck!

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • priesgopriesgo Posts: 19Member

    Yes, sorry for the malformed file, I saw the mis-encoded base call qualities but I did not want to confuse the main point, now I see it is related. In fact the base call qualities were mis-encoded by the GATK's option "--fix_misencoded_quality_scores" wrongly called by me. And this may be causing the error shown above.

    But why I called it? Because when running without this parameter I got the following message:

    ##### ERROR MESSAGE: SAM/BAM file    
    SAMFileReader{/home/priesgo/data/sequences/1000G_releases/20110521/NA12814/exome_alignment/NA12814.22.bam} appears to be using the wrong encoding for quality scores: we encountered an extremely high quality score of 63; please see the GATK --help documentation for options related to this error
    

    And I found this as a possible solution by @ymv in this entry, but it does not seem to apply for my case

    So, let's see this base call qualities encoded string:

    ""3'"IUC34;U\FMI5I\]]_`L<FYZY_^_`\@
    

    The "`" corresponds to 63 in Phred scale and translating some of these characters gives us:

    1 1 18 6 1 40 52 ... 60 60 62 63 43 ... 62 63 59 31  
    

    We can see that the values are correctly distributed in the range from 1 up to 63. Let me ask, is there a way to compress this base call quality range in GATK?

    Sorry for the long post and thanks again. By the way this might be better in another post...

    Pablo.

  • ebanksebanks Posts: 684GATK Developer mod

    No, but you can have the GATK process the file with suspicious quals with the --allow_potentially_misencoded_quality_scores argument.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • priesgopriesgo Posts: 19Member

    Thanks! It worked. What a mess with the qualities...

Sign In or Register to comment.