Does HaplotypeCaller block the bam and/or reference while it is running?

I have been trying to speed up HaplotypeCaller by running multiple instances on the same bam file, but with different intervals provided to each instance through the -L option. We are wrapping our pipeline in a snakemake workflow. Our DAG produced by snakemake indicates that each instance should be able to run in parallel, however when checking through logs these instances are being queued in tandem, rather than in parallel.

I'm not sure if this is a snakemake problem, since other steps are correctly run in parallel. One theory we have is that the bam and reference are being blocked from use until HaplotypeCaller reaches completion for it's given interval, then queues up the next step after the files are unblocked.

In short, does GATK HaplotypeCaller block input files from being used while it is running?

Thanks.

Best Answer

Answers

  • AnaKAnaK USAMember

    Hmm, looking at the documentation it looks like it's not recommended. We ended up getting around the lock by just subsetting the bam and running HC on those smaller bams.

    Thanks for the help!

Sign In or Register to comment.