The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
too many memory and file handle resource required by GenotypeGVCFs

Hi.
It seems that too many memory and file handle resource required by GenotypeGVCFs
command line:
java -XX:-UseCompressedOops -Xms1440g -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=50 -jar /usr/hpc-bio/gatk/GATK.jar -T GenotypeGVCFs -nt 120 -l WARN -R /usr/bio-ref/GRCh38.p9/GRCh38.dna.fa --dbsnp /usr/bio-ref/GRCh38.p9/dbsnp.vcf -nda -maxAltAlleles 25 -A AS_FisherStrand -A AS_QualByDepth -o /ssd//biowrk/CLP/vcf.gatk/proj.GenotypeGVCFs.vcf -V /ssd//biowrk/CLP/gvcf.gatk/1510100/hc.normal.g.vcf -V /ssd//biowrk/CLP/gvcf.gatk/1510109/hc.normal.g.vcf -V /ssd//biowrk/CLP/gvcf.gatk/1510110/hc.normal.g.vcf -V /ssd//biowrk/CLP/gvcf.gatk/1510111/hc.normal.g.vcf ...
g.vcf input: 95 exome
GenotypeGVCFs version:3.6 or 3.7 nightly( 3.7.0 with -nt is NG)
os:CentOS 7.3 (Linux R930 3.10.0-514.6.1.el7.x86_64) and other centos 7
java:1.8.0_121-b13 or 1.8.0_111 or others.
memory(RES): 0.840t -> too many memory is used? and it is always increasing when running.
file handle:
lsof -p 7724 |wc
11704 ; too many handles?
lsof -p 7724 |grep S15212|wc
120 ; a file is opened 120 times, can we open it only only?
Answers
The file handles 11704 is small than 64K and it is not a big problem, but the memory 0.840T is too big for most server.
Try with a lower number of threads.
I have tried with number of thread 20 on a server with 256G memory, but it run out of memory too.
The big problem is that the memory used is always increasing when running.
@wang_yugui
Hi,
Perhaps try with an even lower number of threads.
Have a look at this thread and this thread.
-Sheila
I have 1.5T memory, so I can clearly see that the memory is increasing during running from 150G to 800G.
and the java vm param -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=50 is used, so it should not be a problem of java vm.
The real problem is not how many memory needed,
There should be a bug of resource leak in it. such as itemes in List are not removed after used...
Issue · Github
by Sheila
We run this tool in production and see no such problem. How many sample files are you running on concurrently? We find that we cannot run on more than 200 files concurrently. If you have more you need to merge them hierarchically using CombineGVCFs, as documented.
There are 95 files in one case, and 200 files in another case.
@wang_yugui2,
Please try breaking these down into smaller pools by using CombineGVCFs as Geraldine suggests. You could also try testing whether GenotypeGVCFs can finish genotyping on just a couple of files. Also, please use the stable release of GATK, currently at v3.7. I see you are using v3.6 or some nightly version of 3.7 in your original post.
The GenotypeGVCFs of GATK 3.7 throw exception when -nt option used, so I have to use GATK 3.6 or nightly version of 3.7
@wang_yugui
Hi,
Yes, in that case, it is fine to use a nightly version of 3.7.
-Sheila