This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
HaplotypeCaller scattered job run time
I just managed to use HaplotypeCaller with the lasted version of Queue to call variants on 40 human exomes. The HaplotypeCaller job were scattered into 50 sub jobs and spread in our cluster with Sun Grid Engine.
The problem I found is that sub jobs take quite vary time to finish, which is from 5 hours to 80 hours and majority of them are below 55 hours, hence the whole job were actually slowed down by just a few longer sub jobs. I know that part of the difference were definitely caused by the performance of the cluster node running the job, but I think the major cause of the difference is reply on how the job were split. The qscript I used is adapted from here (without filtering part), from which I can not figure out how the job were split. Hence, I am wondering if anyone could tell me based on what (Genomic Regions ?) HaplotypeCaller job were actually scattered and how I can split the job more evenly so most of the sub jobs will finish at about the same time.
Thanks in advance,