The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
Using Interval List with HaplotypeCaller
I have an ~30gb BAM to pass to haplotype caller and my knowledge of how to make this proceed as quickly as possible. I support a team of CBs who are currently unable to effectively use our gatk workflow due to how long it
takes for us to process BAMs of this size.
First, my understanding is that if an interval list is passed to HaplotypeCaller, some kind of parallel processing is done? If this is true, and given I'm executing this through a WDL and we're running this on the cloud, will specifying more cores increase parallelization?
Also, for my test data set, I'm passing an interval list file formatted like so:
X:1-1500000 X:1500001-3000000 X:3000001-4500000 X:4500001-6000000 X:6000001-7500000
Where this file was generated by chunking out the reference sequence. Is this an acceptable approach for doing this or is there a canonical way of doing it?
Lastly, is there any other documentation (or suggestions) for how to speed up processing of large bam files such as this one with HaplotypeCaller?