GATK Unified Genotyper Too many files open even with ulimit

Hi all !
We have been using GATK for a few years now. Unified Genotyper was working perfectly fine (exome sequencing, TCGA bam files) for SNV and Indels using the -n option. The calls were taking around 15 to 20minutes.

Then it changed to -nct -nt for parallelization. We have assigned 6CPUs for each computation so we tried the different options
-nt 6 -nct 1 : This fails very quickly with the "Too many files open" ERROR MESSAGE

  • nt 2 -nct 3 or -nt 3 -nct 2 : This runs for about 10minutes and then fails with the exact same message
  • nt 1 -nct 6 : It does run with no error but takes around 1hour instead of the 20minutes we had previously achieved.

We did set up the limit of open files to 65535.
It is weird because when it crashes, I then go to the tmp directory that I put in the java command, and there are not even 100 .tmp files created so I'm wondering what could be the issue.

We have tried it with all versions from 6 to 8 and the same problem happens.

It's really creating a bottleneck for us right now so I'm wondering why, however the limit is not reached, GATK keeps crashing.

Thanks all for your help.
Happy holidays.
Manon

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Manon,

    Sorry to hear you've been having these issues. I'll have the software engineers look into this, but I'm afraid it'll have to wait until the new year, since we're pretty much all just wrapping things up for the break. Feel free to ping us in this thread if we don't get back to you within a few days after Jan 2. Happy holidays to you too!

Sign In or Register to comment.