Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCFs resource problems

beadorobeadoro Edinburgh, UKMember


I am using GATK 4.0 to run GenotypeGVCFs on a cohort of 50 samples. After a number of runs which alternately overran assigned cpus or memory (up to ncpus=20, --java-options "-Xmx10g" ), a system administrator of our computing grid suggested that I include

-XX:+UseSerialGC -XX:-BackgroundCompilation

in the java options.

Now GenotypeGVCFs runs happily on the one cpu that I assign to the job, but it still terminates with memory overruns.
The last job had the following full command line

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx20g -XX:+UseSerialGC -XX:-BackgroundCompilation -jar /.../gatk/ GenotypeGVCFs -R genome.fna -V cohort.g.vcf.gz --heterozygosity 0.00144 --heterozygosity-stdev 0.0273 --indel-heterozygosity 2.1E-4 -O all_samples_gatk4_test.vcf.gz

The process ran at the memory limit for most of the time and terminated after 19:23 hrs with this message:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

The output file contained only the header lines.

I could, of course, increase memory even more. I am puzzled though, because I previously analysed the same dataset with GATK 3.7 and GenotypeGVCFs ran successfully with 8gb of memory.

Is there any way I can force GenotypeGVCFs to complete the job with a 'reasonable' amount of memory?

Kind regards,

Issue · Github
by shlee

Issue Number
Last Updated
Closed By

Best Answers


Sign In or Register to comment.