This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Effect of Queue scatter+gather on HaplotypeCaller?
I have a question about choosing the number of scatter jobs when running the HaplotypeCaller in Queue.
Basically, is there a hard and fast rule about how small you can split up the job? From what I understand of HC, given it does local reconstruction of haplotypes anyway, splitting into more jobs shouldn't affect the results.
(My current dataset is mouse whole-genome data with 24 samples, and even scattered into 250 jobs, the longest jobs still took ~6d to run... I'd love to be able to speed it up if I have to re-run HC by splitting into more jobs. As long as it doesn't affect the results!)