BaseRecalibrator Key Error 2002

sheenamssheenams Posts: 9Member
edited October 2012 in Ask the GATK team

I'm trying to run the BaseRecalibrator tool on my data and am getting the following error:

INFO 14:58:17,399 HelpFormatter - --------------------------------------------------------------------------------- [33/222]
INFO 14:58:17,400 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-13-g1706365, Compiled 2012/10/12 19:21:06
INFO 14:58:17,400 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 14:58:17,400 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 14:58:17,401 HelpFormatter - Program Args: -T BaseRecalibrator -I /home/sheenams/gatk_test/LMG-206.GATKinitialrmdup.srt.bam -R /home/genetics/G
enomes/gatk-bundle/human_g1k_v37.fasta -knownSites /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf -knownSites /home/genetics/Genomes/gatk-bundl
e/Mills_and_1000G_gold_standard.indels.b37.sites.vcf -knownSites /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf -o /home/sheenams/gat
k_test/LMG-206.recal_data.csv -log /home/sheenams/gatk_test/LMG-206.gatk_log
INFO 14:58:17,401 HelpFormatter - Date/Time: 2012/10/17 14:58:17
INFO 14:58:17,401 HelpFormatter - ---------------------------------------------------------------------------------
INFO 14:58:17,401 HelpFormatter - ---------------------------------------------------------------------------------
INFO 14:58:17,407 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf to be VCF
INFO 14:58:17,409 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/Mills_and_1000G_gold_standard.indels.b3
7.sites.vcf to be VCF
INFO 14:58:17,410 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf to be VCF
INFO 14:58:17,414 GenomeAnalysisEngine - Strictness is SILENT
INFO 14:58:17,463 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:58:17,479 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 14:58:17,487 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf
WARN 14:58:17,574 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUND
ED but standard is A
INFO 14:58:17,575 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/Mills_and_1000G_gold_standard.indels
.b37.sites.vcf
WARN 14:58:17,589 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but stan
dard is Integer
INFO 14:58:17,590 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf
WARN 14:58:17,603 VCFHeader - Found GL format, but no PL field. As the GATK now only manages PL fields internally automatically adding a correspond
ing PL field to your VCF header
WARN 14:58:17,603 VCFStandardHeaderLines$Standards - Repairing standard header line for field AC because -- count types disagree; header has UNBOUND
ED but standard is A -- descriptions disagree; header has 'Alternate Allele Count' but standard is 'Allele count in genotypes, for each ALT allele, i
n the same order as listed'
WARN 14:58:17,603 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has INTEGER
but standard is A -- descriptions disagree; header has 'Global Allele Frequency based on AC/AN' but standard is 'Allele Frequency, for each ALT alle
le, in the same order as listed'
INFO 14:58:18,093 BaseRecalibrator - The covariates being used here:
INFO 14:58:18,093 BaseRecalibrator - ReadGroupCovariate
INFO 14:58:18,093 BaseRecalibrator - QualityScoreCovariate
INFO 14:58:18,094 BaseRecalibrator - ContextCovariate
INFO 14:58:18,094 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
INFO 14:58:18,094 BaseRecalibrator - CycleCovariate
INFO 14:58:18,136 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
INFO 14:58:18,137 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
INFO 14:58:35,886 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Key 2002 is too large for dimension 2 (max is 2001)
at org.broadinstitute.sting.utils.collections.NestedIntegerArray.put(NestedIntegerArray.java:77)
at org.broadinstitute.sting.gatk.walkers.bqsr.AdvancedRecalibrationEngine.updateDataForPileupElement(AdvancedRecalibrationEngine.java:97)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:244)
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65)
at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.1-13-g1706365):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Key 2002 is too large for dimension 2 (max is 2001)
ERROR ------------------------------------------------------------------------------------------

I didn't see any other questions in the forum that addressed this. Can you please guide me on how to fix this error? I'm running GATK 2.1.13.

Thanks,

Sheena

Post edited by sheenams on

Best Answer

  • ebanksebanks Broad InstitutePosts: 687Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin
    Answer ✓

    Hi Sheena,

    Can you please confirm that your BAM is valid by running it through Picard's ValidateSAMFile? If it is okay, you may need to upload a small portion of it (using PrintReads) from which I can reproduce the error locally.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Answers

  • ebanksebanks Broad InstitutePosts: 687Member, Administrator, GATK Dev, Broadie, Moderator, DSDE Dev, GP Member admin
    Answer ✓

    Hi Sheena,

    Can you please confirm that your BAM is valid by running it through Picard's ValidateSAMFile? If it is okay, you may need to upload a small portion of it (using PrintReads) from which I can reproduce the error locally.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • sheenamssheenams Posts: 9Member

    I think I found the mistake causing this error. I had attempted to use ReducedReads before the realigner. Using it later in the pipeline got rid of the error. Thanks

  • AshuAshu Posts: 21Member

    Hi Eric

    I got a similar error with GenomeAnalysisTK-2.2-2-gf44cc4e 's Base Recalirator. I also ran the picard's validatesamfile and it says NO ERRORs.
    The GATK error was ->##### ERROR MESSAGE: Key 2006 is too large for dimension 2 (max is 2001)

    What exactly does this error mean? what key is it talking about? And how can I fix it?

    Ashu

Sign In or Register to comment.