High AF with Mutect2 in tumor only mode - should I filter?

Hello,
I'm trying to run Mutect2 in tumor only mode without a PoN on some WES samples. I am well aware that these are not optimal conditions but I do not have access to a panel of normal obtained with the same protocol (putting together some 1000G samples would be better than nothing?) and neither a matched normal is available, I know that I'm facing a high number of false germline calls and I would like to know which additional filters seems to work best in this setup.

I followed the pre-processing bestpractices and am using "--germline-resource af-only-gnomad.hg38.vcf.gz --af-of-alleles-not-in-resource 0.0000025".
In one of the samples (the numbers are pretty similar for other ones) I am getting 2962 passed mutations starting from
66047.
I am filtering some mutations with POP_AF=1, believing that they are SNPs where the reference has the minor allele, all gnomad samples could then be listed as having the alternate allele probably...nevertheless they are listed as PASS with NaN risk of being germline.
The distribution of the AF I get has two peaks: one on lower AFs and one around 1: should I remove all the latter
ones considering them as germline and/or technical artifacts that were not filtered due to the lack of a matched normal and/or PoN?
What other parameters should I consider?

Thank you very much, I know fairly well that I am in a not supported context but right now this is everything that I have,
E.

Best Answer

Answers

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    Hi @ElenaGrassi

    We are looking into this issue and will get back to you shortly.

  • bhanuGandhambhanuGandham Member, Administrator, Broadie, Moderator admin

    HI @ElenaGrassi

    I reached our to our dev team and this is what they had to say:

    1) The PoN is actually not useful for filtering out germline calls. That's the job of the --germline-resource (which should be gnomAD). It's actually an undesirable side effect of the PoN that we'll address in the next few months.

    2) Users shouldn't use --af-of-alleles-not-in-resource unless they really know what they're doing. The defaults are smart.
    As for POP_AF=1, I would think that in tumor-only mode the germline filter would always catch that without any extra filtering. The user is running FilterMutectCalls, right?
    Also, a non-matched PoN will still catch a lot of false positives (mainly mapping errors), and is worth using.

    Hope this helps.

  • manbamanba Member ✭✭

    @bhanuGandham I may do not agree with you about "The PoN is actually not useful for filtering out germline calls". you can see this picture in your workshop, really upset to hear that "1) The PoN is actually not useful for filtering out germline calls. That's the job of the --germline-resource (which should be gnomAD). It's actually an undesirable side effect of the PoN that we'll address in the next few months."

  • manbamanba Member ✭✭

    sad to hear what you said, maybe gatk4 should give a more clear dco about which parameter in mutect2 or filtermutect2calls do, some of your staff have different answers in different questions, making me really confused.

  • manbamanba Member ✭✭

    @bhanuGandham I think a clear descrition of record like this can better minor the misunderstanding

    chr1 156844821 . GCTGT G . germline_risk;str_contraction DP=2117;ECNT=1;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;RPA=4,3;RU=CTGT;STR;TLOD=27.38 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB 0/1:1919,15:0.051:909,5:929,10:36:206,229:60:37:false:false:0.010,0.010,8.082e-03:6.937e-04,1.012e-03,0.998

    I can not find clear explaination about

    POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;RPA=4,3;RU=CTGT;STR;
    and AF seems to be a latent variable as one of your staff said

    MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB just do not what this told.

    thanks a lot

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @manba Mutect2 is under active development, and occasionally some of our workshop materials become out of date. Whatever you hear from @bhanuGandham is the most current information.

    As a developer of Mutect2, I want to stress that we run every release through an extensive suite of evaluations. It changes, but it is always production-ready.

  • manbamanba Member ✭✭
    edited January 6

    @davidben @bhanuGandham, thanks for your help.
    But I have to say, it is precisely because I am not familar with this, so I have to find many reference, but I really do not know which is out of date, the data I post is even the newest workshop materials, so it is convenient for you to release the newest workshop materials.

    Q1:
    so even you said Pon mainly for mapping errors, if I put the paired normal sample into PON, and use PON mode without paired, will it make a big difference with PON without the the paired normal sample.

    Q2:
    moreover, both gatk3 and gatk4 do "Filtering of sites in the panel of normals (PoN) and the matched normal remains unchanged, except that the tool will prefilter most of these such that site records are absent from the VCF."

    so you said normal will filter most germline, but pon has little or no effect on germline, so how you explain the Q2?

    thanks a lot

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @manba

    Q1: The panel of normals is more powerful than a single matched normal.
    Q2: The germline model is much more sophisticated in GATK 4.

  • ElenaGrassiElenaGrassi Member
    edited January 7

    Sorry for the delay, I somehow lost the email notifications :neutral:
    @bhanuGandham thank you for your answer.

    @bhanuGandham said:
    1) The PoN is actually not useful for filtering out germline calls. That's the job of the --germline-resource (which should be gnomAD). It's actually an undesirable side effect of the PoN that we'll address in the next few months.

    Mh, ok, I see, considering that I have no way to obtain a PoN I'll cross my fingers.

    2) Users shouldn't use --af-of-alleles-not-in-resource unless they really know what they're doing. The defaults are smart.
    As for POP_AF=1, I would think that in tumor-only mode the germline filter would always catch that without any extra filtering. The user is running FilterMutectCalls, right?

    I am using FilterMutectCalls, yes - they are not filtered as germline, no - do you want more details on how I call Mutect2/FilterMutectCalls&co?
    It' strange to hear that we should not use --af-of-alleles-not-in-resource, since I read about it in a basic tutorial: https://software.broadinstitute.org/gatk/documentation/article?id=11136 and it made sense to adapt it to the gnomad resource...

    Moreover reading this https://github.com/broadinstitute/gatk/issues/4745 and comparing Mutect2 calls with other callers I managed to recover a lot of (reasonably true) lost mutations with AF in range ~0.5 setting it to 0, so now I'm definitely confused :neutral:

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @ElenaGrassi

    Mh, ok, I see, considering that I have no way to obtain a PoN I'll cross my fingers.

    A generic PoN is much better than no PoN at all. In fact, we perform all of our internal validations with unmatched generic PoNs just to keep ourselves honest and avoid overfitting, as well as to simulate the performance a typical user can expect without having access to hundreds of normals. The featured workspace in Firecloud has some.

    It's strange to hear that we should not use --af-of-alleles-not-in-resource, since I read about it in a basic tutorial.

    That tutorial was accurate at the time of its writing. We have since changed the defaults and given different defaults for tumor-only and tumor-normal modes. The defaults in the latest releases are good, whereas older defaults (mea culpa!) were not.

    Another thing to be aware of is that there is no official hg38 gnomAD yet, and lifting-over the hg37 version is surprisingly non-trivial. We recently fixed a bug related to that.

  • ElenaGrassiElenaGrassi Member

    @davidben said:

    >

    A generic PoN is much better than no PoN at all.

    Uh, that's nice. I was under the impression that without matching the sequencing method technical artifacts would be not so efficiently caught, I'll look into Firecloud, thanks!

    That tutorial was accurate at the time of its writing. We have since changed the defaults and given different defaults for tumor-only and tumor-normal modes. The defaults in the latest releases are good, whereas older defaults (mea culpa!) were not.

    Ok, it's hard to stay on top of things, thank you very much for your help - latest releases == ?
    I am using a docker image with 4.0.11.0.

    Another thing to be aware of is that there is no official hg38 gnomAD yet, and lifting-over the hg37 version is surprisingly non-trivial. We recently fixed a bug related to that.

    When I was looking for it I downloaded ftp://[email protected]/bundle/Mutect2/af-only-gnomad.hg38.vcf.gz, should I check if it has been updated?

  • ElenaGrassiElenaGrassi Member
    edited January 9

    Thanks, I'll switch to 4.1 when it's out there! I need to run things on a cluster here therefore paying for Firecloud is not an option, will definitely at least try to get the generic PoN to see if things change - ASAP I also hope to obtain the % of mutations with VAF ~ 1 that fall on regions with a deletion on the other allele and thus make sense.

    My user experience (coming from being more of a developer :)) has been overall very good, the documentation and the feedback here is great and having docker images is perfect, the main issue is checking if the information found in the forum/guides/etc are up to date or not. The setup of the whole pipeline has been technically easy but due to the complexity of the question and the number of parameters determining which ones needs to be adapted to our specific goals and which ones should be left as they stood is hard, expecially because we do not really have a ground truth right now to compare with and I am not sure if asking my boss for some time to generate an in silico dataset would be a wise decision.

    Again thank you very much for your help!

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Thanks @ElenaGrassi, we appreciate the feedback!!
    If you could mark any of the above posts that helped answer your question, that would also be appreciated.

Sign In or Register to comment.