How the GATK engine processes huge input files?
For most of the GATK sequencing tools, the input files, such as the .bam .vcf files, are very huge. Obviously, it is impossible that the GATK engine reads all of them into memory, and then parses them and loads them to walkers. So, I was wondering if someone here could briefly explain how the GATK engine reads and parses the input files, especially in the multithread situation in which there would be multiple map() threads waiting for input data. Does the engine firstly read part of the input files, parses them and reorganized them into the GATK formats, and then sends them to multiple map() threads? Or other processing pattern has been adopted?
Thanks very much.