Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Using Interval List with HaplotypeCaller

Hi,

I have an ~30gb BAM to pass to haplotype caller and my knowledge of how to make this proceed as quickly as possible. I support a team of CBs who are currently unable to effectively use our gatk workflow due to how long it
takes for us to process BAMs of this size.

First, my understanding is that if an interval list is passed to HaplotypeCaller, some kind of parallel processing is done? If this is true, and given I'm executing this through a WDL and we're running this on the cloud, will specifying more cores increase parallelization?

Also, for my test data set, I'm passing an interval list file formatted like so:

<chr>:<start_position>-<end_position>

For example:

X:1-1500000
X:1500001-3000000
X:3000001-4500000
X:4500001-6000000
X:6000001-7500000

Where this file was generated by chunking out the reference sequence. Is this an acceptable approach for doing this or is there a canonical way of doing it?

Lastly, is there any other documentation (or suggestions) for how to speed up processing of large bam files such as this one with HaplotypeCaller?

Thanks!
Amr

Best Answer

Answers

  • amr@broadinstitute.orge[email protected] Member, Broadie

    Thanks very much! I'll look at your examples.

  • amr@broadinstitute.orge[email protected] Member, Broadie

    Thanks very much Geraldine. I notice the examples referencing GATK4. Will I get parallelization using 3.7 as well?

    I agree regarding chunking more intelligently. The simple chunking I am using is just to try to see the parallelization working in our custom gatk wdl.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @[email protected]
    Hi Amr,

    You can find a version for GATK3 here. It is in the process of being updated. Keep an eye out for the release.

    -Sheila

Sign In or Register to comment.