To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Mutect 2 java memory shortage: a case study.

Hi all,
I've recently been finding memory problems when running GATK3 Mutect2. I know updating to GATK4 might resolve some of the problems, but let's say for educational purposes that that is not an option for the moment.

The memory shortage always jumps up when "parsing" through the region 152,307,072 on chr1. I verified manually/IGV/DepthOfCoverage; and as we could expect, there is a high coverage in this region (around 1000reads).

The tool is run on a server with 160G RAM, and 40 processors. I ofcourse would like to pump as many resources as possible into this tool. The reference assembly is hg38.

Here follows a case study of changing the many different parameters. The memory usage is assessed with the free -h tool.

**** EXAMPLE1: nct 30

An error poppes up.

**** EXAMPLE2: nct 25, -L option enabled

Again, an error poppes up.

**** EXAMPLE3: nct 15, -L option enabled

Again, an error poppes up.

**** EXAMPLE4: nct 35, -L option enabled, after installing 50GB of swap

Again, an error poppes up. Note that none of the swap memory was every used!~~~~

****SOLUTION1: nct5 -L option enabled
No error poppes up anymore, although the runtime is too long.

****SOLUTION2: nct35, -L option enabled, -Xmx2048M, -Xms2048M
No error poppes up anymore.

Could anyone explain to me
-why is there an error about memory shortage, while the free -h tool clearly showes not even 5% is used?
- supposedly this has something to do with the nct option, but the parrallisation documentation denies that nct should affect it (https://software.broadinstitute.org/gatk/documentation/article.php?id=1975)
- Which XmX and Xms value should be chosen so that everything runs fast and smooth?
- Why isn't my swap memory ever accessed?

Answers

  • SkyWarriorSkyWarrior TurkeyMember

    -nct is not working properly. It is stated many times here at forums that -nct is not functioning the way it is supposed to be and performance gains are usually hit and miss/negligible. What is recommended btw is using queue and WDL to parallelize using scatter gather approach.

    My take home from all my trials with -nct is that it is worthless for HC and Mutect. Only usable for BQSR and using more than 8 usually disasterous due to java overhead.

    TL;DR

    Java sucks at multithreading alone thats why apache spark is considered the way parallelism should be done with GATK4.

Sign In or Register to comment.