We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

too many open files

Hi,

I've seen this problem mentioned many times here, but I wonder if I may have some new contribution here. Our users are running GATK 3.8, and see this error:

##### ERROR MESSAGE: Unable to parse header with error: /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub3088197262002724909.tmp (Too many open files)

I've monitored some runs and noticed that while the run progresses, there is more and more leftover open file descriptors:

lsof | grep VariantContextWriterStub
java       67922   frodeli 1563r      REG                8,5    10740643                  968 /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub498651467373697139.tmp (deleted)
java       67922   frodeli 1564w      REG                8,5     6282467                  992 /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub2281266380489957735.tmp

Essentially all VariantContextWriterStub files have a (deleted) flag, and only some of them don't. After 24h of runtime 1432 out of 1467 entries are reported as deleted. This makes me wonder: maybe there is a resource leak in GATK, i.e., the temporary file is deleted without being closed before.

Regards,

Marcin

Tagged:

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    You may need to up your "ulimit". If you are working on a standalone system it would be easy but for infrastructure settings you need to consult your administrator to up your numbers.

    On a stand alone system using docker may also be a good alternative for overcoming this problem.

  • Thanks. I know about the limit. My question is about a possible bug. It looks like the number of deleted but open temporary files increases with time. It seems likely that GATK deletes these files, but it should close them first.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    This is definitely a resource leak as you explain if those files were left unclosed. Can your users try with a later version of GATK4.0 to see if this leak still persists?

  • I will ask them if that's possible, but I think they might not know the answer - some users simply do module load gatk and use the version that comes.. Do you know if there is any important difference / incompatibility, which could stop them? Otherwise we just submit a job and see how it goes.

  • Hello,
    I need to perform GATK analysis on my Maize chloroplast DNA samples. I already generated bam files after aligning data from my samples with reference maize chloroplast genome. My objective is to look foa a single base changes in different samples.
    Please provide me the steps to perform the analysis on fire cloud. I do not have any programming skills, so please let me know how I can import GATK analysis files from other workspaces or methods.

  • So of course the users don't know how to use v4 yet, because the semantics changed completely. It will take time.

    Regarding the run with 3.8, it crashed after 68 hours with the following error:

    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836): 
    ##### ERROR
    ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
    ##### ERROR The error message below tells you what is the problem.
    ##### ERROR
    ##### ERROR If the problem is an invalid argument, please check the online documentation guide
    ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ##### ERROR
    ##### ERROR Visit our website and forum for extensive documentation and answers to 
    ##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ##### ERROR
    ##### ERROR MESSAGE: Unable to parse header with error: /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub6301713927497601206.tmp (Too many open files), for input source: /tmp/org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub6301713927497601206.tmp
    ##### ERROR ------------------------------------------------------------------------------------------
    

    This error message is very strange: how is this a user error that I can fix myself? The code was running for almost 70 hours, and crashed then.

    Also, the 'Too many open files' problem is not really reasonable, since I've changed the limits to max 32768 open files, and I checked using lsof around 4h before the crash - the application had ~3000 open files.

    To check the leaked resources possibility I dumped the list of open file descriptors throughout the run using lsof | grep $USER > out. I compared the list I got around 24h ago and 10h ago. The newer file contained 99% of the file names reported in the older file. Also, most files were reported as (deleted). So it really looks that gatk is not closing the temporary files it creates and deletes.

    I'd appreciate any help with this.

    Thanks!

  • And now I ran the program with -nt 1 (previous results were for -nt 2), and the problem is gone. That means that the resource leak only applies to the multi-threaded version.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @MarcinK
    Hi,

    Thanks for reporting your solution. There were many issues with multi-threading in GATK3, which is why the team chose to remove it in GATK4.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @DIWAKER
    Hi,

    I think someone will help you here.

    -Sheila

Sign In or Register to comment.