Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GenotypeGVCFs resource problems

beadorobeadoro Edinburgh, UKMember

Hello,

I am using GATK 4.0 to run GenotypeGVCFs on a cohort of 50 samples. After a number of runs which alternately overran assigned cpus or memory (up to ncpus=20, --java-options "-Xmx10g" ), a system administrator of our computing grid suggested that I include

-XX:+UseSerialGC -XX:-BackgroundCompilation

in the java options.

Now GenotypeGVCFs runs happily on the one cpu that I assign to the job, but it still terminates with memory overruns.
The last job had the following full command line

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -Xmx20g -XX:+UseSerialGC -XX:-BackgroundCompilation -jar /.../gatk/4.0.1.0/gatk-package-4.0.1.0-local.jar GenotypeGVCFs -R genome.fna -V cohort.g.vcf.gz --heterozygosity 0.00144 --heterozygosity-stdev 0.0273 --indel-heterozygosity 2.1E-4 -O all_samples_gatk4_test.vcf.gz

The process ran at the memory limit for most of the time and terminated after 19:23 hrs with this message:

Runtime.totalMemory()=20759052288
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

The output file contained only the header lines.

I could, of course, increase memory even more. I am puzzled though, because I previously analysed the same dataset with GATK 3.7 and GenotypeGVCFs ran successfully with 8gb of memory.

Is there any way I can force GenotypeGVCFs to complete the job with a 'reasonable' amount of memory?

Kind regards,
Beate

Issue · Github
by shlee

Issue Number
3007
State
closed
Last Updated
Assignee
Array
Closed By
chandrans

Best Answers

Answers

Sign In or Register to comment.