We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

What exactly does the --minReadsPerAlignmentStart flag specify in HaplotypeCaller?

Specifically, what does the 'start' component of this flag mean? Do the reads all have to start in exactly the same location? Alternatively, does the flag specify the total number of reads that must overlap a putative variant before that variant will be considered for calling?

Best Answer


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭



    It means that the reads will all have to start at the beginning of the active region. Please read more about active regions here: https://www.broadinstitute.org/gatk/guide/article?id=4147

    For example, let's say we have an active region at position 1-200. If you set --minReadsPerAlignmentStart 250, 250 reads that start at position 1 will be used. If you have more than 250 reads, random downsampling will occur. If you have less than 250 reads at position 1, all the reads will be used. Please note, not all reads that are used may span the entire active region.


  • chlangleychlangley UCDMember


    The default minReadsPerAlignmentStart is 5.
    Is there a explanation/discussion about this choice and under what circumstances another value would be optimal.

    In a deep WGS with PCR duplicates removed, 5 reads starting at the boundary of the active region (indeed any random position) should be well below 5.

    Am I missing something; I did look at https://www.broadinstitute.org/gatk/guide/article?id=4147 .

    Is there a procedure for optimizing the minReadsPerAlignmentStart parameter ?

    Thanks for the high quality and timely support.


  • AlexanderVAlexanderV BerlinMember

    I have a followup on that.

    Is -minReadsPerAlignStart used to decide if a region is defined as active?
    Or just as a threshold for the downsampling algorithm?

    If first, I have following situation (---- are reads, good base quality):


    Will this be considered as active region?
    The marked area has 5 reads covering it. But just 1 is starting at the active regions start.
    So, what is the case? I would not consider this as a bad site.

    I think this is also what @chlangley meant:

    In a deep WGS with PCR duplicates removed, 5 reads starting at the boundary of the active region (indeed any random position) should be well below 5.

    @Sheila, reading your answer (Nr. 2) again, it seems that it ist just for protecting the region from too much downsampling.

    As a followup to this: Is there a minimum # of reads to trigger an active region?

    [...] the per-position score is the probability that the position contains a variant as calculated using the reference-confidence model applied to the original alignment.

    Is there something written about this reference-confidence model, what can answer my question?


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭
    edited August 2015


    No minReadsPerAlignStart is not used to determine if a region is active. Have a look at how active regions are defined here: https://www.broadinstitute.org/gatk/guide/article?id=4147
    The minReadsPerAlignStart is simply to make sure that the start position of an active region is covered by a certain number of reads. So, it is possible that a region could be marked as active, but not output because there are not enough reads at the start position.

    In your example, the region would be marked as active. The reads themselves do not have to start at the active region start, they simply have to cover the active region start.

    I think that to tag a region as active, only 1 read is necessary, however I will have to check and get back to you.

    A document on the Reference Confidence Model is in progress.


    Issue · Github
    by Sheila

    Issue Number
    Last Updated
    Closed By
  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi again.

    With only one read, the calculation will be done to see if it's active, but one read may not be enough evidence for triggering an active region.


Sign In or Register to comment.