Lower GATK's HaplotypeCaller threshold for allele frequency as part of a variant calling pipeline

alons123alons123 HerzliyaMember

Hi!

As part of a variant calling pipeline i'm interested in lowering the threshold for allele frequency tolerance in GATK's HaplotypeCaller variant caller to 0.01 (1%), if possible. If not, is there another variant calling tool that doesn't fliter out variants with low allele frequencies?

Background: I know for a fact that there's at least one variant in my sample that doesn't appear in my VCF file if allele frequencies below 0.1 (10%) are filtered out during the variant calling as it's the default in some programs. I can see the variant when I inspect the corresponding bam file with samtools tview and another lab called that specific variant itself.

Thanks in advance,
Alon

Best Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    HC doesn't explicitly filter on allele frequency, but due to how the evaluation of probabilities are set up, it's unlikely that something so low would be called. You'd be better off either using an artificially high ploidy setting, or using a somatic variant caller like MuTect, which can handle that sort of frequency.

  • SheilaSheila Broad Institute admin
    Accepted Answer

    @alons123
    Hi Alon,

    When you have a higher ploidy, the detection rate will be set to a lower threshold. For example, when you have ploidy 2, the tool looks for alleles to be approximately present at 50% each. But, for ploidy 3, the tool will expect alleles to be present at approximately 33% each.

    I hope this helps.

    -Sheila

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alons123
    Hi Alon,

    Do you know if the variant is real, or are you assuming it is real from the IGV screenshot? Haplotype Caller does a realignment that may change the positions of the reads. You can use the bamout argument to see the bam file after realignment. https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--bamOutput

    Thanks,
    Sheila

  • alons123alons123 HerzliyaMember

    Hi Sheila,
    Thanks for the quick answer!

    I'm now running HaplotypeCaller with the --bamoutput option as you suggested, waiting for the results.
    Yes, i'm assuming it's real from the IGV screenshot but also because a lab we work with found that variant and had it in their VCF file. I think it has to do with the allele frequency as iv'e managed to find it with another program after I manually lowered it's allele frequency tolerance threshold argument.

    Thanks,
    Alon

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    HC doesn't explicitly filter on allele frequency, but due to how the evaluation of probabilities are set up, it's unlikely that something so low would be called. You'd be better off either using an artificially high ploidy setting, or using a somatic variant caller like MuTect, which can handle that sort of frequency.

  • alons123alons123 HerzliyaMember
    edited July 2015

    Thank you Geralidne,
    I'll probably start working with MuTect as you suggested.
    One thing though, I don't quite understand why a high ploidy setting might solve it, can you explain please?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @alons123
    Hi Alon,

    When you have a higher ploidy, the detection rate will be set to a lower threshold. For example, when you have ploidy 2, the tool looks for alleles to be approximately present at 50% each. But, for ploidy 3, the tool will expect alleles to be present at approximately 33% each.

    I hope this helps.

    -Sheila

  • alons123alons123 HerzliyaMember
    edited August 2015

    Thank you Sheila,
    This helps tremendously, I completely understand now.

    Thanks again everybody!

Sign In or Register to comment.