Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

[GATK PrintReads error]Key 1036 is too large for dimension 2 (max is 1001)

sirmarksirmark Posts: 4Member
edited March 2013 in Ask the GATK team

hi all! I'm trying to complete my first GATK run, I'm doing the step in the "EXECUTION STEP" following section.

please tell me if the step execution are globally correct.

------------------------------------------------------------------ERRORS----------------------------------------------------------------

the step 4.1 isn't executed without -maxCycle 1500.

when try to execute 4.2 step I got the following error:

ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Key 1036 is too large for dimension 2 (max is 1001) at org.broadinstitute.sting.utils.collections.NestedIntegerArray.put(NestedIntegerArray.java:128) at org.broadinstitute.sting.utils.recalibration.RecalibrationReport.parseAllCovariatesTable(RecalibrationReport.java:157) at org.broadinstitute.sting.utils.recalibration.RecalibrationReport.(RecalibrationReport.java:68) at org.broadinstitute.sting.utils.recalibration.BaseRecalibration.(BaseRecalibration.java:74) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.setBaseRecalibration(GenomeAnalysisEngine.java:217) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:253) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.3-9-ge5ebf34):
ERROR
ERROR Please visit the wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Key 1036 is too large for dimension 2 (max is 1001)
ERROR ------------------------------------------------------------------------------------------

---------------------------------------------------------------EXECUTION STEP---------------------------------------------------------

2 MARKING PCR DUPLICATE

java -Xmx4g -Djava.io.tmpdir=/tmp -jar MarkDuplicates.jar INPUT=M9.bam OUTPUT=m9.marked.bam METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT

3 LOCAL REALIGNMENT AROUND INDEL

3.1

java -Xmx4g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta -knowndbsnp_137.hg19.vcf -o m9.list -I m9.marked.bam

3.2

java -Xmx4g -Djava.io.tmpdir=/tmp -jar GenomeAnalysisTK.jar -I m9.marked.bam -R ucsc.hg19.fasta -T IndelRealigner -targetIntervals m9.list -known dbsnp_137.hg19.vcf -o m9.marked.realigned.bam

3.2.1

java -Xmx4g -jar GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T ReduceReads -I m9.marked.realigned.bam -o m9.marked.realigned.reduce.bam

3.3

java -Djava.io.tmpdir=/tmp/flx-auswerter -Xmx4g -jar FixMateInformation.jar INPUT=m9.marked.realigned.reduce.bam OUTPUT=m9.marked.realigned.reduce.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true

4 QUALITY SCORE RECALIBRATION

4.1

java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R ucsc.hg19.fasta -knownSites dbsnp_137.hg19.vcf -I m9.marked.realigned.reduce.fixed.bam -T BaseRecalibrator -maxCycle 1500 -cov ReadGroupCovariate -cov QualityScoreCovariate -o m9.recal_data.grp

4.2 ***********************************

java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -R ucsc.hg19.fasta -I m9.marked.realigned.reduce.fixed.bam -BQSR m9.recal_data.grp -o m9.marked.realigned.reduce.fixed.recal.bam

5 SNP CALLING

5.1

java -Xmx4g -jar GenomeAnalysisTK.jar -nct 4 --num_threads 4 -glm BOTH -R ucsc.hg19.fasta -T UnifiedGenotyper --sample_ploidy 5 -I m9.marked.realigned.reduce.fixed.bam -D dbsnp_137.hg19.vcf -o m9.vcf -stand_call_conf 20.0 -stand_emit_conf 20.0
-A DepthOfCoverage -A AlleleBalance

Post edited by sirmark on

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,089Administrator, GATK Developer admin

    Hi there,

    Your steps look fine in general.

    The issue you encountered ("key too large") has been fixed in version 2.4. Just upgrade to the latest version and it should work.

    Geraldine Van der Auwera, PhD

  • sirmarksirmark Posts: 4Member

    ok, I upgrade to 2.4-9 and now I got:

    ERROR MESSAGE: The maximum allowed value for the cycle is 500, but a larger cycle (518) was detected. Please use the --maximum_cycle_value argument to increase this value (at the expense of requiring more memory to run)

    But when I try to use the option in this way:

    java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -R ucsc.hg19.fasta -I m9.marked.realigned.reduce.fixed.bam -BQSR m9.recal_data.grp -maxCycle 600 -o m9.marked.realigned.reduce.fixed.recal.bam

    I got:

    ERROR MESSAGE: Argument with name 'maxCycle' isn't defined.

    If I use the option in other way like this: java -Xmx4g -jar GenomeAnalysisTK.jar -T PrintReads -R ucsc.hg19.fasta --maximum_cycle_value 600 -I m9.marked.realigned.reduce.fixed.bam -BQSR m9.recal_data.grp -o m9.marked.realigned.reduce.fixed.recal.bam

    I got:

    ERROR MESSAGE: Argument with name 'maximum_cycle_value' isn't defined.

    Where is my mistake ? I haven't found info regarding 'maximum_cycle_value' option.

    thanks for help!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,089Administrator, GATK Developer admin

    You need to repeat the recalibration procedure. The --maximum_cycle_value argument belongs to BaseRecalibrator, not PrintReads. We'll make the error message clearer for the next release.

    Geraldine Van der Auwera, PhD

  • sirmarksirmark Posts: 4Member

    after many tests, my problem was due to ReduceReads Step 3.2.1 that it was done after BaseRecalibrator and PrintReads and not before. Thanks for support.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,089Administrator, GATK Developer admin

    @sirmark, just to be clear, you should always do Base Recalibration before ReduceReads. Base Recalibration does not work on reduced bams.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.