Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

GenotypeGVCFs resource problems

beadorobeadoro Edinburgh, UKMember


I am using GATK 4.0 to run GenotypeGVCFs on a cohort of 50 samples. After a number of runs which alternately overran assigned cpus or memory (up to ncpus=20, --java-options "-Xmx10g" ), a system administrator of our computing grid suggested that I include

-XX:+UseSerialGC -XX:-BackgroundCompilation

in the java options.

Now GenotypeGVCFs runs happily on the one cpu that I assign to the job, but it still terminates with memory overruns.
The last job had the following full command line

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx20g -XX:+UseSerialGC -XX:-BackgroundCompilation -jar /.../gatk/ GenotypeGVCFs -R genome.fna -V cohort.g.vcf.gz --heterozygosity 0.00144 --heterozygosity-stdev 0.0273 --indel-heterozygosity 2.1E-4 -O all_samples_gatk4_test.vcf.gz

The process ran at the memory limit for most of the time and terminated after 19:23 hrs with this message:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

The output file contained only the header lines.

I could, of course, increase memory even more. I am puzzled though, because I previously analysed the same dataset with GATK 3.7 and GenotypeGVCFs ran successfully with 8gb of memory.

Is there any way I can force GenotypeGVCFs to complete the job with a 'reasonable' amount of memory?

Kind regards,

Issue · Github
by shlee

Issue Number
Last Updated
Closed By

Best Answers


Sign In or Register to comment.