This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Reducing memory footprint for GenotypeGVCFs (gatk 18.104.22.168). TMP_DIR an issue?
I'm running GenotypeGVCFs 22.214.171.124 on a SLURM cluster, and having great difficulty determining how much memory is needed -- i constantly run out. If someone could provide a predictor for memory footprint, that would be useful. Moderate amounts such as 120GB or 140GB RAM are insufficient -- even if my jobs' working sets are in the hundreds-of-MBs ballpark. Asking for more RAM causes long job scheduling delays.
My setup: Following GATK best practices, I first run HaplotypeCaller in GVCF mode for each sample, then import my ~2400 samples in batches of 200 into a GenomicsDB over a 1MBp region. The last step, GenotypeGVCFs, just blows up in RAM usage. I'm working with sunflower DNA (>3Gbp genome). diploid. data's been aligned from paired-end illumina sequencing, filtered, markdup'd, sorted. 5x coverage on average. It's naturally messy however.
Things I've tried to reduce the memory footprint:
- I've tried limiting the java heap size with
-Xmx to 4GB less than my allocation limit. e.g. if I ask for a 140GB job, I'll give 136GB to java -- I figured that would be a very conservative buffer to take OOMKILL out of the picture.
- Reducing the working set -- i.e. splitting the region of interest of each unit of work in progressively finer intervals. I'm down to 1Mbp regions now, which is already very inconvenient.
- I'm not even using any of the
-nt options. Just using the default single data processing thread.
- I haven't tried
--use-new-qual yet, but I plan to (and I'll report back).
It's possible something outside Java might be eating up RAM. Can someone confirm or deny if GenotypeGVCFs with GenomicsDB inputs writes to typically-RAM-backed filesystems? Writing to tmpfs (such as /tmp), or /dev/shm counts towards my job's memory limit, so that should be avoided. The documentation isn't clear as to what exactly
--TMP_DIRachieves or even if it's used at all. Maybe there are other java defines
-D I could set?