We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Using Interval List with HaplotypeCaller


I have an ~30gb BAM to pass to haplotype caller and my knowledge of how to make this proceed as quickly as possible. I support a team of CBs who are currently unable to effectively use our gatk workflow due to how long it
takes for us to process BAMs of this size.

First, my understanding is that if an interval list is passed to HaplotypeCaller, some kind of parallel processing is done? If this is true, and given I'm executing this through a WDL and we're running this on the cloud, will specifying more cores increase parallelization?

Also, for my test data set, I'm passing an interval list file formatted like so:


For example:


Where this file was generated by chunking out the reference sequence. Is this an acceptable approach for doing this or is there a canonical way of doing it?

Lastly, is there any other documentation (or suggestions) for how to speed up processing of large bam files such as this one with HaplotypeCaller?


Best Answer


  • amr@broadinstitute.orge[email protected] Member, Broadie

    Thanks very much! I'll look at your examples.

  • amr@broadinstitute.orge[email protected] Member, Broadie

    Thanks very much Geraldine. I notice the examples referencing GATK4. Will I get parallelization using 3.7 as well?

    I agree regarding chunking more intelligently. The simple chunking I am using is just to try to see the parallelization working in our custom gatk wdl.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @[email protected]
    Hi Amr,

    You can find a version for GATK3 here. It is in the process of being updated. Keep an eye out for the release.


Sign In or Register to comment.