To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

ReduceReads Memory Usage


I've been trying to get ReduceReads working in a pipeline I've made that incorporates GATK tools to call variants in RNA-seq data.
After performing indel realignment and base recalibration I'm trying to use ReduceReads prior to calling variants using Unified Genotyper.

I've been using GATK version 2.3.9. When I try to use ReduceReads on a 1.7Gb .bam file, I need to set aside 100Gb memory to perform the operation for the process to complete (otherwise I'll get an error saying I didn't provide enough memory to run the program and to adjust the maximum heap size using the -Xmx option etc).

The problem isn't that ReduceReads doesn't work - it does, however of the 100Gb I set aside, it uses 80-90Gb of it. This means I can't run more than one job at a time due to the constraints of the machine I'm using etc.

I've been looking through the GATK forum and understand it may be a GATK version issue, though I've tried using GATK 2.5.2 ReduceReads for this step and it still requires 70-80Gb memory.

Can anyone provide any clues as to what I may be doing wrong? or whether I can do something to make it use less memory so I can run multiple jobs simultaneously?

The command I'm using is:

java -Xmx100g -jar /NMC/LCR/GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -T ReduceReads -R /SCRATCH/LCR/BWAIndex_hg19/genome.fa -I out.bam.sorted.bam.readGroups.bam.rmdup.bam.realigned.bam.recalibrated.bam -o out.bam.sorted.bam.readGroups.bam.rmdup.bam.realigned.bam.recalibrated.bam.reducedReads.bam

Thanks in advance,



  • ebanksebanks Broad InstituteMember, Broadie, Dev

    Hi Alex,

    When you say that Reduce Reads works on your RNA-seq data do you mean that it is actually producing correct results or that it's just not failing with an error? Because I am nearly 100% sure that it will not produce the correct results with RNA-seq data and shouldn't be used (but am happy to be wrong). If you are unsure then I'd recommend not running RR on your data.

    We are planning on producing a best practices recommendations document for RNA-seq processing with the GATK, but unfortunately we don't have anything official just yet.

  • agoutagout Member

    Hi Eric,

    Thanks for this feedback - much appreciated! I've left RediceReads out of the pipeline.

    It would be great to have a GATK - RNA-seq processing best practices document at hand. Look forward to it.

    Best Regards,

Sign In or Register to comment.