Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

BaseRecalibrator Key Error 2002

sheenamssheenams Posts: 9Member
edited October 2012 in Ask the team

I'm trying to run the BaseRecalibrator tool on my data and am getting the following error:

INFO 14:58:17,399 HelpFormatter - --------------------------------------------------------------------------------- [33/222] INFO 14:58:17,400 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-13-g1706365, Compiled 2012/10/12 19:21:06 INFO 14:58:17,400 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 14:58:17,400 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 14:58:17,401 HelpFormatter - Program Args: -T BaseRecalibrator -I /home/sheenams/gatk_test/LMG-206.GATKinitialrmdup.srt.bam -R /home/genetics/G enomes/gatk-bundle/human_g1k_v37.fasta -knownSites /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf -knownSites /home/genetics/Genomes/gatk-bundl e/Mills_and_1000G_gold_standard.indels.b37.sites.vcf -knownSites /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf -o /home/sheenams/gat k_test/LMG-206.recal_data.csv -log /home/sheenams/gatk_test/LMG-206.gatk_log INFO 14:58:17,401 HelpFormatter - Date/Time: 2012/10/17 14:58:17 INFO 14:58:17,401 HelpFormatter - --------------------------------------------------------------------------------- INFO 14:58:17,401 HelpFormatter - --------------------------------------------------------------------------------- INFO 14:58:17,407 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf to be VCF INFO 14:58:17,409 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/Mills_and_1000G_gold_standard.indels.b3 7.sites.vcf to be VCF INFO 14:58:17,410 ArgumentTypeDescriptor - Dynamically determined type of /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf to be VCF INFO 14:58:17,414 GenomeAnalysisEngine - Strictness is SILENT INFO 14:58:17,463 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 14:58:17,479 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 INFO 14:58:17,487 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/dbsnp_135.b37.vcf WARN 14:58:17,574 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUND ED but standard is A INFO 14:58:17,575 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/Mills_and_1000G_gold_standard.indels .b37.sites.vcf WARN 14:58:17,589 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but stan dard is Integer INFO 14:58:17,590 RMDTrackBuilder - Loading Tribble index from disk for file /home/genetics/Genomes/gatk-bundle/1000G_phase1.indels.b37.vcf WARN 14:58:17,603 VCFHeader - Found GL format, but no PL field. As the GATK now only manages PL fields internally automatically adding a correspond ing PL field to your VCF header WARN 14:58:17,603 VCFStandardHeaderLines$Standards - Repairing standard header line for field AC because -- count types disagree; header has UNBOUND ED but standard is A -- descriptions disagree; header has 'Alternate Allele Count' but standard is 'Allele count in genotypes, for each ALT allele, i n the same order as listed' WARN 14:58:17,603 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has INTEGER but standard is A -- descriptions disagree; header has 'Global Allele Frequency based on AC/AN' but standard is 'Allele Frequency, for each ALT alle le, in the same order as listed' INFO 14:58:18,093 BaseRecalibrator - The covariates being used here:
INFO 14:58:18,093 BaseRecalibrator - ReadGroupCovariate INFO 14:58:18,093 BaseRecalibrator - QualityScoreCovariate INFO 14:58:18,094 BaseRecalibrator - ContextCovariate INFO 14:58:18,094 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3 INFO 14:58:18,094 BaseRecalibrator - CycleCovariate INFO 14:58:18,136 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] INFO 14:58:18,137 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 14:58:35,886 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Key 2002 is too large for dimension 2 (max is 2001) at org.broadinstitute.sting.utils.collections.NestedIntegerArray.put(NestedIntegerArray.java:77) at org.broadinstitute.sting.gatk.walkers.bqsr.AdvancedRecalibrationEngine.updateDataForPileupElement(AdvancedRecalibrationEngine.java:97) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:244) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:106) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:65) at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:62) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:265) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:93)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.1-13-g1706365):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Key 2002 is too large for dimension 2 (max is 2001)
ERROR ------------------------------------------------------------------------------------------

I didn't see any other questions in the forum that addressed this. Can you please guide me on how to fix this error? I'm running GATK 2.1.13.

Thanks,

Sheena

Post edited by sheenams on

Best Answer

Answers

Sign In or Register to comment.