Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
How the GATK engine processes huge input files?
For most of the GATK sequencing tools, the input files, such as the .bam .vcf files, are very huge. Obviously, it is impossible that the GATK engine reads all of them into memory, and then parses them and loads them to walkers. So, I was wondering if someone here could briefly explain how the GATK engine reads and parses the input files, especially in the multithread situation in which there would be multiple map() threads waiting for input data. Does the engine firstly read part of the input files, parses them and reorganized them into the GATK formats, and then sends them to multiple map() threads? Or other processing pattern has been adopted?
Thanks very much.