Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Information of I/O libraries used in source code of GATK4 for achieving runtime IO optimization

Dear GATK Team,

I want to perform IO level runtime optimization for GATK4 in an distributed environment. For this reason I need to know what are the the IO libraries used in GATK4 modules. I did not get any material or relevant post regarding any of the GATK 4 modules in the forum.

Kindly inform or redirect me to relevant resources for the same.

Thank you

Regards

Abhishek Panda

Best Answer

  • cnormancnorman United States ✭✭
    Accepted Answer

    @abhishekpanda GATK uses a quite a few different methods and libraries to access data, depending on the tool, type of data, data store, and type of file system in which the data resides (local disk storage, cloud storage, file system type). I don't think there is a simple, single answer. All of the code is open source though, and the dependent libraries are listed in the gradle file.

Answers

  • cnormancnorman United StatesMember, Broadie, Dev ✭✭
    Accepted Answer

    @abhishekpanda GATK uses a quite a few different methods and libraries to access data, depending on the tool, type of data, data store, and type of file system in which the data resides (local disk storage, cloud storage, file system type). I don't think there is a simple, single answer. All of the code is open source though, and the dependent libraries are listed in the gradle file.

  • Thank you @cnorman. I will look into gradle file.

    For lustre parallel filesystem, do you have any IO optimization suggestions for making GATK performant.
Sign In or Register to comment.