The function of -max_deletion_fraction argument

Hey people,
Just wanted to clarify about the function of -max_deletion_fraction argument for UnifiedGenotyper tool.
The documentation says:
""If the fraction of reads with deletions spanning a locus is greater than this value (0.05 byDefault) , the site will not be considered callable and will be skipped. To disable the use of this parameter, set its value to >1"".

The locus will not be considered callable only for SNVs.... correct?
Calling of deletions is ultimately regulated by the parameter -min_indel_fraction and -min_indel_count, hence, -max_deletion_fraction would not actually affect del calling....correct?
Thanks...

Issue · Github
by Sheila

Issue Number
2748
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • tedtoaltedtoal Member
    edited November 2017

    I have the same question. The description as-is leads one to think that a deletion could never be called.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Shrey_S @tedtoal
    Hi,

    Sorry for the dealy @Shrey_S. Not sure how this fell off our radar.

    If you take a look at section 5 here, there is a description. However, I agree it is confusing. I will ask the team what exactly it means. I don't think it means sites where greater than 0.05 of the reads have deletions are not called. However, this is a good case to argue for using HaplotypeCaller for variant calling :smile:

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    The key thing to understand here is that this is applied to sites that are overlapped by deletions that start upstream. The deletion can be called just fine since that is evaluated from the site where the deletion starts. What is getting skipped is any site downstream of the deletion where a substantial proportion of reads show a deletion. See sketch below.

    The reason we have this in UnifiedGenotyper is because the tool is incapable of modeling both the Indel and the nearby SNP in a single pass -- it's like one is handled by its left brain and the other by the right brain. So this logic is basically Righty saying "if I'm seeing a bunch of deletions here, it's either crap sequence or there's something real that Lefty is going to call, so I shouldn't try to call anything here". It's pretty dumb and probably loses real SNPs, but if you relax it you tend to get a whole bunch of false positives. Yet another reason why HaplotypeCaller is far better than UnifiedGenotyper (or any position-based caller that operates similarly) since the haplotype-based logic completely obviates this problem.

Sign In or Register to comment.