The current GATK version is 3.2-2

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

# BaseRecalibrator : expand BUFFER ?

Posts: 4Member

Hi, I have an error in the step BaseRecalibrator and even increasing the memory allocated to the job, I still have the same error and nothing found on previous published posts :

##### ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Insufficient buffer size for Xs overhanging genome -- expand BUFFER at org.broadinstitute.sting.gatk.datasources.providers.ReferenceView.getReferenceBases(ReferenceView.java:121) at org.broadinstitute.sting.gatk.datasources.providers.ReadReferenceView$Provider.getBases(ReadReferenceView.java:87) at org.broadinstitute.sting.gatk.contexts.ReferenceContext.fetchBasesFromProvider(ReferenceContext.java:145) at org.broadinstitute.sting.gatk.contexts.ReferenceContext.getBases(ReferenceContext.java:189) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateIsSNP(BaseRecalibrator.java:335) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:253) at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.map(BaseRecalibrator.java:132) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(TraverseReadsNano.java:228) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano\$TraverseReadsMap.apply(TraverseReadsNano.java:216) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102) at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56) at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108) at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311) at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245) at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152) at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

##### ERROR ------------------------------------------------------------------------------------------

Tagged:

Geraldine Van der Auwera, PhD

• Posts: 4Member

Here is my full command-line : java -jar /mnt/seq2/seq2/SANDBOX_DIR/MARC/Utils/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar -T BaseRecalibrator -nct 8 -I /mnt/seq2/seq2/SANDBOX_DIR/MARC/ERCC1/H55Sophie/rawdata/A549A/snp_gatk/untrimmed/no_rmdup/A549A_realigned.bam -R /share/apps/data/hg19/bwa_0.7.5/hg19.fa -knownSites /share/apps/data/dbsnp_135.hg19.vcf -U ALLOW_N_CIGAR_READS --read_buffer_size 3000000 -o /mnt/seq2/seq2/SANDBOX_DIR/MARC/ERCC1/H55Sophie/rawdata/A549A/snp_gatk/untrimmed/no_rmdup/A549A_realigned_recal.grp

The previous steps "RealignerTargetCreator" et "IndelRealigner" worked just fine with the same reference and the same data.

I see. Do you get the same error if you run without specifying --read_buffer_size?

Geraldine Van der Auwera, PhD

• Posts: 4Member

I incremented step by step the "--read_buffer_size" from the default value trying to overcome this error but even without using this argument, I obtain the same result.

Ah, never mind the read buffer; I looked in the code and it's a different buffer that's involved, for storing reference context info. There's a hard-coded limit of 10000 bases.

I assume you're working with RNAseq data, since you're using the -U ALLOW_N_CIGAR_READS parameter? It may be that you have some long stretches of Ns that are causing the buffer overflow. At this time I don't think we have any workaround for this, so you might need to skip recalibration entirely (or find a way to exclude very long strings of Ns).

We are looking into formulating specific recommendations for using GATK on RNAseq data but that will take a little while longer. In the meantime we're interested in hearing about the experiences of people who have been trying this on their own. If you or your colleagues have any observations you'd like to share with us on this topic we'd be happy to hear them.

Geraldine Van der Auwera, PhD

• Posts: 4Member

Oh okay, thanks. No problem to share experience on that but I don't think is due to N's because data is preprocessed to keep only high quality reads and reads woth N are trimmed.

Have a good day