The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
Last chance to register for the GATK workshop next week in Basel, Switzerland! http://www.sib.swiss/training/upcoming-training-events/training/gatk-workshop-lecture

ReduceReads Memory Usage

agoutagout Posts: 4Member

Hi,

I've been trying to get ReduceReads working in a pipeline I've made that incorporates GATK tools to call variants in RNA-seq data.
After performing indel realignment and base recalibration I'm trying to use ReduceReads prior to calling variants using Unified Genotyper.

I've been using GATK version 2.3.9. When I try to use ReduceReads on a 1.7Gb .bam file, I need to set aside 100Gb memory to perform the operation for the process to complete (otherwise I'll get an error saying I didn't provide enough memory to run the program and to adjust the maximum heap size using the -Xmx option etc).

The problem isn't that ReduceReads doesn't work - it does, however of the 100Gb I set aside, it uses 80-90Gb of it. This means I can't run more than one job at a time due to the constraints of the machine I'm using etc.

I've been looking through the GATK forum and understand it may be a GATK version issue, though I've tried using GATK 2.5.2 ReduceReads for this step and it still requires 70-80Gb memory.

Can anyone provide any clues as to what I may be doing wrong? or whether I can do something to make it use less memory so I can run multiple jobs simultaneously?

The command I'm using is:

java -Xmx100g -Djava.io.tmpdir=/RAW/javatmp/ -jar /NMC/LCR/GenomeAnalysisTK-2.5-2-gf57256b/GenomeAnalysisTK.jar -T ReduceReads -R /SCRATCH/LCR/BWAIndex_hg19/genome.fa -I out.bam.sorted.bam.readGroups.bam.rmdup.bam.realigned.bam.recalibrated.bam -o out.bam.sorted.bam.readGroups.bam.rmdup.bam.realigned.bam.recalibrated.bam.reducedReads.bam

Thanks in advance,

Alex

Comments

  • ebanksebanks Broad InstitutePosts: 698Member, Administrator, Broadie, Moderator, Dev admin

    Hi Alex,

    When you say that Reduce Reads works on your RNA-seq data do you mean that it is actually producing correct results or that it's just not failing with an error? Because I am nearly 100% sure that it will not produce the correct results with RNA-seq data and shouldn't be used (but am happy to be wrong). If you are unsure then I'd recommend not running RR on your data.

    We are planning on producing a best practices recommendations document for RNA-seq processing with the GATK, but unfortunately we don't have anything official just yet.

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • agoutagout Posts: 4Member

    Hi Eric,

    Thanks for this feedback - much appreciated! I've left RediceReads out of the pipeline.

    It would be great to have a GATK - RNA-seq processing best practices document at hand. Look forward to it.

    Best Regards,
    Alex

Sign In or Register to comment.