The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Haplotype Caller Active Region

I am getting the following output on the progress monitor of Haplotype Caller:

INFO 13:50:30,687 ProgressMeter - 17:73500593 0.00e+00 19.9 h 15250.3 w 100.0% 19.9 h 0.0 s

WARN 13:50:56,090 DiploidExactAFCalc - this tool is currently set to genotype at most 6 alternate alleles in a given context, but the context at
17:73335276 has 7 alternate alleles so only the top alleles will be used; see the --max_alternate_alleles argument

INFO 13:51:30,706 ProgressMeter - 17:73500593 0.00e+00 19.9 h 15250.3 w 100.0% 19.9 h 0.0 s

Why is it that the active region is listed as 17:73500593, yet apparently Haplotype Caller is looking at 17:73335276? It seems like ~20,000 bases away would not be considered active. What does the active region column (-f3) actually display/represent.

Thanks,
-Paul Pemberton

Tagged:

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there, sorry to get back to you so late, your post slipped through my net.

    The progress meter is showing how far the ActiveRegionTraversalEngine has progressed. Those active regions are added to a queue and then sent to the HaplotypeCaller's map function. What gets printed out is the region where the action was happening at the time the ProgressMeter call was triggered; not every region gets mentioned in that output, if that makes sense. If you have a list of regions A, B, C, D, E, F, G, it is quite possible, depending on rate of progression, that you might get the following output:

    INFO ... ProgressMeter ... region A
    INFO ... ProgressMeter ... region D
    WARN ... problem somewhere in region E
    INFO ... ProgressMeter ... region G
    
  • That makes perfect sense. My question is more along the lines of why it is possible for the progress meter to output something like the following:

    Assume that we have the same regions (A, B, C, D, E, F, G). If progress meter outputs

    INFO ... ProgressMeter ... region A INFO... ProgressMeter ... region G INFO ... ProgressMeter ... region G INFO ... ProgressMeter ... region G INFO ... ProgressMeter ... region G WARN ... problem somewhere in region B INFO ... ProgressMeter ... region G .......

    Where region B is ~200,000 bp away from region G, isn't it slightly strange that the progress meter jumped ahead and then went back to region B after analyzing region G. Could it be due to multiple threads moving a different rates but all reporting to the same progress meter? It just seemed like a strange output and did not allow for accurate progress prediction (although I understand that it is difficult to ever have accurate progress prediction, I thought that the active region output may be closer to reality than 200,000 bp). Does this make sense, and could the multi-threading be the issue?

    Thanks,

    -Paul

  • pdexheimerpdexheimer Member, Dev

    pedaling through the sauerkraut

    I love it! I've never heard that one before, the imagery is fabulous

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Heh, it's from my native French, "pédaler dans la choucroute". It's too good a phrase to keep confined to a single language.

Sign In or Register to comment.