Filter by TLOD only in Mutect2

cmartinezruizcmartinezruiz United KingdomMember
I am using Mutect2 in GATK v4.1.4.0 to look for somatic variants in several tumor samples with matched germline. Because of the nature of the samples, I know I can trust variants with relatively low VAF, so I wanted to relax the filtering to allow tumor variants with an LOD similar to that of germline variants (~ 2.2). In previous versions of Mutect2 I would have simply set --tlod at 2.2 during the filtering step. The newest versions of Mutect2, however, does not have this option anymore and relies instead on a beta score (--f-score-beta) to tighten or relax the false discovery rate during the filtering step.
The issue with the beta score is that if I relax the filtering to a point where variants with TLOD >= 2.2 pass the filter, I end up with many variants with very low values in other fields (e.g. STRANDQ=1).
I could relax the filter to allow TLOD >= 2.2 and then filter again manually the resulting VCF to remove variants with low values in other fields, but this seems a rather convoluted way of approaching this issue and it feels like there should be a better way to do it.
In short, is there a way in the latest Mutect2 versions to allow for variants with low TLOD to pass the filtering step without relaxing all filters in the other fields?
Thank you!

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @cmartinezruiz You could try setting -log-snv-prior and -log-indel-prior to higher (less negative) values than their defaults of -13.8, but this strikes me as sketchy. A TLOD of 2.2 means that the likelihood of somatic variation is only 100 times that of sequencing error, and thus such variants are only really believable if you have an overwhelmingly high rate of somatic variation -- at least one in 100 sites.

    I have to wonder why these variants are compelling if their TLOD is so low. It is possible for low-AF variants to have a high TLOD in the case of high depth or high-quality reads, but if you have neither what is there to distinguish these from sequencing error?

  • cmartinezruizcmartinezruiz United KingdomMember
    Thaks @davidben , I think I had misunderstood what TLOD was doing then. Is LOD set to 2.2 for the germline variants because we expect a high proportion of those, then?

    I am trying to run Mutect2 on healthy tissue to detect somatic variants. To be clear, I am looking for somatic variants present only on the focal tissue, so essentially, I am running Mutect2 using blood DNA as a normal and the DNA from the focal healthy tissue as "tumor". Both blood and focal tissue were sequenced at an average coverage of 400x. I expect to find relatively few variants at low AF.

    I assumed that because Mutect2 has been designed for tumor samples, the filtering would be very stringent to account for the noisiness of cancer data. Because I expect healthy tissue to be more homogeneous than cancer tissue, I was looking at a way to relax those filters. I assumed that TLOD would be the variable to look at, but it looks like I was mistaken.

    What would be the best approach in this case, then? Is the default filtering in Mutect2 not too stringent for detecting variants in healthy tissue?

    Thanks again!
  • cmartinezruizcmartinezruiz United KingdomMember
    @davidben yes I see now how that makes sense. I had understood the filtering step in the complete opposite way. Thank you very much!
