Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Does HaplotypeCaller block the bam and/or reference while it is running?

I have been trying to speed up HaplotypeCaller by running multiple instances on the same bam file, but with different intervals provided to each instance through the -L option. We are wrapping our pipeline in a snakemake workflow. Our DAG produced by snakemake indicates that each instance should be able to run in parallel, however when checking through logs these instances are being queued in tandem, rather than in parallel.

I'm not sure if this is a snakemake problem, since other steps are correctly run in parallel. One theory we have is that the bam and reference are being blocked from use until HaplotypeCaller reaches completion for it's given interval, then queues up the next step after the files are unblocked.

In short, does GATK HaplotypeCaller block input files from being used while it is running?

Thanks.

Best Answer

Answers

  • AnaKAnaK USAMember

    Hmm, looking at the documentation it looks like it's not recommended. We ended up getting around the lock by just subsetting the bam and running HC on those smaller bams.

    Thanks for the help!

Sign In or Register to comment.