To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

DepthOfCoverage memory usage

igorigor New YorkMember
edited November 2015 in Ask the GATK team

Is there a way to manage DepthOfCoverage memory usage? I am having problems when I give it a large intervals file. I can successfully run other tools like RealignerTargetCreator, IndelRealigner, and BaseRecalibrator, which seem like they would be more memory-intensive. I can also run DepthOfCoverage with --omitIntervalStatistics --omitLocusTable --omitDepthOutputAtEachBase. However, running it with just --omitDepthOutputAtEachBase gives me a memory error:
##### ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java
Is there any way to optimize that?

Answers

  • tommycarstensentommycarstensen United KingdomMember

    You can give GATK 4GB of memory like this:

    java7 -Xmx4000m GATK.jar ...
    
  • igorigor New YorkMember

    I already assign certain amounts of memory for all of the tools, which is much larger than the BAM sizes. I can do that. My question is why DepthOfCoverage is only one that has any problems and only with certain parameters.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @igor
    Hi,

    I suspect you have some regions of very high coverage. DepthOfCoverage performs badly on those regions because it does not apply any downsampling, but other tools do not perform badly on those regions because they do apply downsampling.

    -Sheila

  • igorigor New YorkMember

    I explicitly turn off downsampling on all tools. Also, DepthOfCoverage works, but only with certain parameters. That is the troubling part for me.

  • So what you're really saying is that asking DoC to calculate Interval Statistics and/or Locus Tables is taking too much memory? This seems reasonable to me. The help for --omitLocusTable says that you're deciding whether to calculate "per-sample per-depth counts of loci". You've not described your data at all, but it's not hard to imagine that computing a depth histogram for every sample simultaneously might require some memory.

  • igorigor New YorkMember

    I am only using one sample at a time.

Sign In or Register to comment.