GenotypeGVCFs resource problems

beadorobeadoro Edinburgh, UKMember

Hello,

I am using GATK 4.0 to run GenotypeGVCFs on a cohort of 50 samples. After a number of runs which alternately overran assigned cpus or memory (up to ncpus=20, --java-options "-Xmx10g" ), a system administrator of our computing grid suggested that I include

-XX:+UseSerialGC -XX:-BackgroundCompilation

in the java options.

Now GenotypeGVCFs runs happily on the one cpu that I assign to the job, but it still terminates with memory overruns.
The last job had the following full command line

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx20g -XX:+UseSerialGC -XX:-BackgroundCompilation -jar /.../gatk/4.0.1.0/gatk-package-4.0.1.0-local.jar GenotypeGVCFs -R genome.fna -V cohort.g.vcf.gz --heterozygosity 0.00144 --heterozygosity-stdev 0.0273 --indel-heterozygosity 2.1E-4 -O all_samples_gatk4_test.vcf.gz

The process ran at the memory limit for most of the time and terminated after 19:23 hrs with this message:

Runtime.totalMemory()=20759052288
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

The output file contained only the header lines.

I could, of course, increase memory even more. I am puzzled though, because I previously analysed the same dataset with GATK 3.7 and GenotypeGVCFs ran successfully with 8gb of memory.

Is there any way I can force GenotypeGVCFs to complete the job with a 'reasonable' amount of memory?

Kind regards,
Beate

Issue · Github
by shlee

Issue Number
3007
State
closed
Last Updated
Assignee
Array
Closed By
chandrans

Best Answers

Answers

Sign In or Register to comment.