Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How the GATK engine processes huge input files?

PengWeiPRCPengWeiPRC United StatesMember


For most of the GATK sequencing tools, the input files, such as the .bam .vcf files, are very huge. Obviously, it is impossible that the GATK engine reads all of them into memory, and then parses them and loads them to walkers. So, I was wondering if someone here could briefly explain how the GATK engine reads and parses the input files, especially in the multithread situation in which there would be multiple map() threads waiting for input data. Does the engine firstly read part of the input files, parses them and reorganized them into the GATK formats, and then sends them to multiple map() threads? Or other processing pattern has been adopted?

Thanks very much.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Unfortunately we do not currently have the resources to provide detailed support for how the engine works and to explain such low-level operations. You'll need to look at the code and figure it out yourself. Good luck!

Sign In or Register to comment.