The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

BaseRecalibrator : expand BUFFER ?

mdelogermdeloger Member Posts: 4

I have an error in the step BaseRecalibrator and even increasing the memory allocated to the job, I still have the same error and nothing found on previous published posts :

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

org.broadinstitute.sting.utils.exceptions.ReviewedStingException: Insufficient buffer size for Xs overhanging genome -- expand BUFFER
at org.broadinstitute.sting.gatk.datasources.providers.ReferenceView.getReferenceBases(
at org.broadinstitute.sting.gatk.datasources.providers.ReadReferenceView$Provider.getBases(
at org.broadinstitute.sting.gatk.contexts.ReferenceContext.fetchBasesFromProvider(
at org.broadinstitute.sting.gatk.contexts.ReferenceContext.getBases(
at org.broadinstitute.sting.gatk.walkers.bqsr.BaseRecalibrator.calculateIsSNP(
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsMap.apply(
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(
at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(
at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(
at org.broadinstitute.sting.commandline.CommandLineProgram.start(
at org.broadinstitute.sting.commandline.CommandLineProgram.start(
at org.broadinstitute.sting.gatk.CommandLineGATK.main(

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.6-5-gba531bd):
ERROR Please check the documentation guide to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions
ERROR MESSAGE: Insufficient buffer size for Xs overhanging genome -- expand BUFFER
ERROR ------------------------------------------------------------------------------------------

Thank you in advance

Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,647 admin

    Hmm, this is new. Can you post your full command line? Is there anything special/ out-of-ordinary about your reference or your data?

    Geraldine Van der Auwera, PhD

  • mdelogermdeloger Member Posts: 4

    Here is my full command-line :
    java -jar /mnt/seq2/seq2/SANDBOX_DIR/MARC/Utils/GenomeAnalysisTK-2.6-5-gba531bd/GenomeAnalysisTK.jar -T BaseRecalibrator -nct 8 -I /mnt/seq2/seq2/SANDBOX_DIR/MARC/ERCC1/H55Sophie/rawdata/A549A/snp_gatk/untrimmed/no_rmdup/A549A_realigned.bam -R /share/apps/data/hg19/bwa_0.7.5/hg19.fa -knownSites /share/apps/data/dbsnp_135.hg19.vcf -U ALLOW_N_CIGAR_READS --read_buffer_size 3000000 -o /mnt/seq2/seq2/SANDBOX_DIR/MARC/ERCC1/H55Sophie/rawdata/A549A/snp_gatk/untrimmed/no_rmdup/A549A_realigned_recal.grp

    The previous steps "RealignerTargetCreator" et "IndelRealigner" worked just fine with the same reference and the same data.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,647 admin

    I see. Do you get the same error if you run without specifying --read_buffer_size?

    Geraldine Van der Auwera, PhD

  • mdelogermdeloger Member Posts: 4

    I incremented step by step the "--read_buffer_size" from the default value trying to overcome this error but even without using this argument, I obtain the same result.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,647 admin

    Ah, never mind the read buffer; I looked in the code and it's a different buffer that's involved, for storing reference context info. There's a hard-coded limit of 10000 bases.

    I assume you're working with RNAseq data, since you're using the -U ALLOW_N_CIGAR_READS parameter? It may be that you have some long stretches of Ns that are causing the buffer overflow. At this time I don't think we have any workaround for this, so you might need to skip recalibration entirely (or find a way to exclude very long strings of Ns).

    We are looking into formulating specific recommendations for using GATK on RNAseq data but that will take a little while longer. In the meantime we're interested in hearing about the experiences of people who have been trying this on their own. If you or your colleagues have any observations you'd like to share with us on this topic we'd be happy to hear them.

    Geraldine Van der Auwera, PhD

  • mdelogermdeloger Member Posts: 4

    Oh okay, thanks.
    No problem to share experience on that but I don't think is due to N's because data is preprocessed to keep only high quality reads and reads woth N are trimmed.

    Have a good day

Sign In or Register to comment.