We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

HaplotypeCaller scattered job run time


I just managed to use HaplotypeCaller with the lasted version of Queue to call variants on 40 human exomes. The HaplotypeCaller job were scattered into 50 sub jobs and spread in our cluster with Sun Grid Engine.

The problem I found is that sub jobs take quite vary time to finish, which is from 5 hours to 80 hours and majority of them are below 55 hours, hence the whole job were actually slowed down by just a few longer sub jobs. I know that part of the difference were definitely caused by the performance of the cluster node running the job, but I think the major cause of the difference is reply on how the job were split. The qscript I used is adapted from here (without filtering part), from which I can not figure out how the job were split. Hence, I am wondering if anyone could tell me based on what (Genomic Regions ?) HaplotypeCaller job were actually scattered and how I can split the job more evenly so most of the sub jobs will finish at about the same time.

Thanks in advance,



Best Answers


  • brspurribrspurri New YorkMember

    I'm running into this exact problem. I have my own interval list, but I don't know how to separate the interval list out into chunks and send a different chunk to each scatter node. Is this even possible? The way it looks to me, scatter manages which portions of the intervals are sent to each scatter node (which is wildly disproportionate - at least the way I know how to use it). Anyone have any advice?

Sign In or Register to comment.