Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Why does Mutect2 return SNPs with non-zero alt-allele freq's in the normal sample, but 0 alt reads?

pwaltmanpwaltman New York, NYMember

I'm finding that most of the somatic SNPs that Mutect2 is inferring, have non-zero alt-alleles in the normal sample, but when I look at the AD field, they almost always have 0 reads with the alt-allele (see example below).

Is this because of the PON that I used (the one provided in the Mutect2 resource kit (my data set is too small (n=10) to generate my own))?

example below (copied from R, which was used to parse the vcf):

observedInType2AF$vcf[1:3, c( 1:7, 9:10, 42:44 ) ]

chrom pos id ref alt qual filter
794 1 9795131 . T C . PASS
12960 1 148004625 . C A . PASS
format
794 GT:AD:AF:DP:F1R2:F2R1:OBAM:OBAMRC:OBQ:OBQRC:SAAF:SAPP
12960 GT:AD:AF:DP:F1R2:F2R1:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:SAAF:SAPP
ac1
794 0/0:38,0:0.098:38:19,0:19,0:false:false
12960 0/0:40,0:0.053:40:24,0:16,0:false:false
cuac1856
794 0/1:58,52:0.473:110:20,25:38,27:false:false:52.65:100.00:0.444,0.455,0.473:0.024,0.014,0.962
12960 ./.
cuac1857
794 0/1:15,7:0.327:22:6,2:9,5:false:false:53.96:100.00:0.273,0.303,0.318:0.021,0.023,0.956
12960 ./.

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @pwaltman

    Please read this blog about a new QUAL model and how genotype likelihood is used to calculate allele frequencies: https://software.broadinstitute.org/gatk/blog?id=23525

    The way our tools calculate AF, they take into account the likelihoods of the reads supporting the alleles. Those likelihoods are not available in the VCF output, but I think you can have them output in an intermediate file (but, you would have to look into the code to do that).

  • pwaltmanpwaltman New York, NYMember

    Thanks @bhanuGandham , but I read the provided link, and it focused primarily on germline analysis tools, rather than Mutect2, and even then, it still didn't explain how an allele frequency could be estimated for the normal if there were no reads in the normal that support the alt-allele.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited April 22

    HI @pwaltman

    Lets back up a little, I guess I don't quite understand the question,

    794 0/0:38,0:0.098:38:19,0:19,0:false:false
    12960 0/0:40,0:0.053:40:24,0:16,0:false:false

    Here the genotype(GT 0/0) matches the allele depth(AD 38/0 and 40,0) values. So I am not sure what you mean by:

    have non-zero alt-alleles in the normal sample, but when I look at the AD field, they almost always have 0 reads with the alt-allele

    Can you please clarify. Maybe I am missing something here?

  • pwaltmanpwaltman New York, NYMember

    Sorry for not following up before. Yes, neither variant has any reads supporting the alt-allele. However, both have non-zero alt-allele frequencies, i.e. for the first variant the genotype is 0/0, and the reads are 38,0. However, the alt-allele frequency is 0.098 - a non-zero allele frequency, even though there are no reads to support that alt-allele.

    I guess I understand that mutect2 is using some sort of likelihood model to estimate the allele frequency, but if I'm trying to filter out SNP calls that appear in the normal, it's not clear how I should treat a case like this (a SNP with no reads supporting the alt-allele, but with a non-zero alt-allele frequency).

  • AdelaideRAdelaideR Member admin

    I am going to reach out to the development team for some more information, but the likelihoods are calculated across the entire genome which can lead to some confidence intervals that are not reflective of an unsupported alt-allele.

    You might want to take a look at this documentation here about why allele depths are lower than expected.

  • AdelaideRAdelaideR Member admin

    @pwaltman

    This is the information that I received (from @davidben):

    The alt allele frequency should be thought of as the alt allele frequency _conditional on there being a real variant_.  As an example, suppose there were a location with 40 reads, 4 of which have an alt base, but all four are at low base quality.  Then the alt allele frequency will be reported as 10%, but no variant will be called because Mutect2 will recognize this as an artifact, and the alt allele depth will be 0.  As far as filtering reads in the tumor sample, there is no need for the user to attempt to do this by hand, Mutect2 will already make the decision about whether or not variants should be filtered.
    
    
  • pwaltmanpwaltman New York, NYMember

    Thanks @AdelaideR ! I've definitely observed putative somatic SNVs returned by Mutect2, where there are reads in the normal sample that also contain the alt-allele. However, I didn't think to consider whether there were enough reads with the alt-allele in the normal to indicate whehter it is a true het mutation. If there fewer than 50% of the reads in the normal that contain the alt-allele, would that indicate that the normal contained the somatic mutation as well?

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @pwaltman FilterMutectCalls looks for two things with alt reads in the normal. First, it attempts to genotype a diploid germline variant in the normal using the NLOD computed by Mutect2 and the POPAF prior in the unfiltered VCF from the germline resource. If it genotypes as a germline het it gets the germline filter. Second, it attempts to diagnose whether the alt reads are an artifact (that is, a technical or mapping artifact but not a sequencing error) in the normal. If they are, FilterMutectCalls assumes that they must be an artifact in the tumor as well. This relies on an internal model for the overall rate of artifacts in the normal, the somatic clustering model for allele fractions of real somatic variants, and the NALOD calculated by Mutect2, which is the likelihood of the alt reads in the normal arising from sequencing error, which has no bearing on their status in the tumor.

    You can read about this in our recently-updated documentation: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf.

    By the way, because this is the forum and other people read this I want to discourage users from the idea that they can replace FilterMutectCalls with a set of heuristics or thresholds. The tool is heavily model-based and can't be reduced to anything less. If users are not satisfied with FilterMutectCalls, letting us know what's going wrong will be more productive than hand-crafting an alternative filtering scheme.

Sign In or Register to comment.