The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# Haplotype Caller Active Region

Posts: 6Member

I am getting the following output on the progress monitor of Haplotype Caller:

INFO 13:50:30,687 ProgressMeter - 17:73500593 0.00e+00 19.9 h 15250.3 w 100.0% 19.9 h 0.0 s

WARN 13:50:56,090 DiploidExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at 17:73335276 has 7 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument

INFO 13:51:30,706 ProgressMeter - 17:73500593 0.00e+00 19.9 h 15250.3 w 100.0% 19.9 h 0.0 s

Why is it that the active region is listed as 17:73500593, yet apparently Haplotype Caller is looking at 17:73335276? It seems like ~20,000 bases away would not be considered active. What does the active region column (-f3) actually display/represent.

Thanks, -Paul Pemberton

Tagged:

• Posts: 6Member

*200,000

Hi there, sorry to get back to you so late, your post slipped through my net.

The progress meter is showing how far the ActiveRegionTraversalEngine has progressed. Those active regions are added to a queue and then sent to the HaplotypeCaller's map function. What gets printed out is the region where the action was happening at the time the ProgressMeter call was triggered; not every region gets mentioned in that output, if that makes sense. If you have a list of regions A, B, C, D, E, F, G, it is quite possible, depending on rate of progression, that you might get the following output:

INFO ... ProgressMeter ... region A
INFO ... ProgressMeter ... region D
WARN ... problem somewhere in region E
INFO ... ProgressMeter ... region G


Geraldine Van der Auwera, PhD

• Posts: 6Member

That makes perfect sense. My question is more along the lines of why it is possible for the progress meter to output something like the following:

Assume that we have the same regions (A, B, C, D, E, F, G). If progress meter outputs

INFO ... ProgressMeter ... region A
INFO... ProgressMeter ... region G
INFO ... ProgressMeter ... region G
INFO ... ProgressMeter ... region G
INFO ... ProgressMeter ... region G
WARN ... problem somewhere in region B
INFO ... ProgressMeter ... region G
.......

Where region B is ~200,000 bp away from region G, isn't it slightly strange that the progress meter jumped ahead and then went back to region B after analyzing region G. Could it be due to multiple threads moving a different rates but all reporting to the same progress meter? It just seemed like a strange output and did not allow for accurate progress prediction (although I understand that it is difficult to ever have accurate progress prediction, I thought that the active region output may be closer to reality than 200,000 bp). Does this make sense, and could the multi-threading be the issue?

Thanks,

-Paul

• Posts: 373Member, GSA Collaborator ✭✭✭

pedaling through the sauerkraut

I love it! I've never heard that one before, the imagery is fabulous