UnifiedGenotyper pooled:how to set the minimum number of reads supporting an alternate allele

Hello,

I am using the UnifiedGenotyper for pooled samples and would like to set the minimum count (fraction) of reads supporting an alternate allele in order to distinguish low frequency alleles from sequencing errors. Is there a way to do it? I couldn't find a means to do that or the default threshold in the tool's documentation . Thank you for your support.

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Eva
    Hi Eva,

    You are correct there is no way to do that with Unified Genotyper. You can use Select Variants after using Unified Genotyper to select for sites that match your criteria.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    To clarify, there is no need to do that, because the tools will model the probability of error and estimate the confidence level of variants for you in a way that is more sophisticated than just specifying a minimum number of supporting reads, and includes mapping quality as well as base call quality. Once you have the variant call, you will apply some filtering techniques as described in the best practices documentation, either variant recalibration if you have enough data, or hard filtering if you don't. Finally, you will have the possibility of filtering the data further and subsetting the variants you are interested in with SelectVariants.

    Note also that you should no longer be using UnifiedGenotyper; you should use HaplotypeCaller instead. Have a look at our Best Practices docs for the latest recommendations.

  • EvaEva Member
    edited July 2015

    Thank you Sheila and Geraldine for the prompt reply. @Geraldine, Thank you for the clarification. The reason I wanted to filter this way is because I wanted to check the effect varying the minimum of number of reads will have on the number of variants I can call from my pooled data (I have both pooled and individual sequence data for comparison). Sheila replied in this post by saying "...if there are 2 reads present at a site that have alternate alleles with a base quality of 40 or higher, a variant allele should be called at that site" so does it mean that the minimum number of reads is 2 (and say not 1 for example) in addition to considering the other factors such as mapping and base qualities you mentioned ?
    I am using UnifiedGenotyper because I am dealing with Pooled data and I thought HaplotypeCaller didn't handle such data, or has that capability been included recently?

    Post edited by Eva on
Sign In or Register to comment.