We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Mutect2

Hi,

I am running some initial experiments with Mutect2 and am seeing some seemingly very clear calls being missed. These calls are made by the original Mutect (and other callers). Stepping through the code for one of the calls of interest, it seems that calls are only made if both the tumor LOD and normal LOD exceed thresholds. Should not the normal LOD be allowed to be lower than the threshold?

From SomaticGenotypingEngine (when this passes, the call is created):
if (tumorLod >= MTAC.INITIAL_TUMOR_LOD_THRESHOLD && normalLod >= MTAC.INITIAL_NORMAL_LOD_THRESHOLD)

Command:
-T MuTect2 -R /home/lmose/reference/chr7/chr7.fa -I:tumor /home/lmose/dev/edico/mutect/egfr1/tumor.bam -I:normal /home/lmose/dev/edico/mutect/egfr1/normal.bam -L /home/lmose/dev/edico/mutect/egfr1/test.bed -o /home/lmose/dev/edico/mutect/egfr1/precompiled/test.vcf --debug

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Glad to hear your problem seems to be resolved.

  • ZoldraxZoldrax RussiaMember

    Hi! I get to many "alt_allele_in_normal" variants with default MuTect2 settings. It looks like MuTect2 does not filter base quality in normal. I have added base quality off flags " -A -Q 0" to samtools mpileup for watching this Alt Allele in normal sample. There is only one "c" allele in normal sample and it has bad quality and is "anomalous read pairs". There are other calls liked this.

    For example see attached text file.

    Issue · Github
    by Sheila

    Issue Number
    582
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • kchang3kchang3 Member

    If I have tumor and matched normal. Is the best practice for Mutect2 to generate PON first. Then run Mutect2 with tumor and normal pair as well as with PON?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yes, that's correct.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Zoldrax, MuTect2 performs some reassembly that may modify the alignment of reads, which could explain what you're seeing. See this tutorial for explanations of how you can get MuTect2 to produce an output bam file showing the reassembled region. The tutorial is written for HaplotypeCaller but the commands can be adapted for MuTect2.

  • ZoldraxZoldrax RussiaMember

    @Geraldine_VdAuwera, Thank you! I've seen bam after reassembling, but have not seen any significant differences. However I replaced read group names in my bam files to simplest with picard AddOrReplaceReadGroups tool and had significant lesser "alt_allele_in_normal" values in the filter field, include SNP from example in my previous post. The SNP has become "PASS". IMHO, you should add checking for allowable read groups names in input bam files, if it is significant to results.
    Now I'm seeing new result more thoroughly and looking differences in some SNP with MuTect v1.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @Zoldrax
    Hi,

    Can you clarify what you mean by "allowable read group names"?

    Thanks,
    Sheila

  • ZoldraxZoldrax RussiaMember
    edited March 2016

    @Sheila, I don't know exactly too :) But I think that the group names or other fields around they was incorrect in my bam files for MuTect2 and picard AddOrReplaceReadGroups tool corrected this bug. It would be better if MuTect2 print error if it can't read and differentiate read groups correctly, if they significant to result.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @Zoldrax
    Hi!

    I did have a slight smile when reading your response :smile: I'm amazed you managed to fix the issue without knowing exactly what the problem is! Good for you :grin:

    As for potential issues with read groups, we do have a document here that I hope will help.

    -Sheila

  • manon_sourdeixmanon_sourdeix FranceMember

    Hi everyone !
    I have a similar problem when using MuTecT2 on my data (Illumina amplicon based sequencing, tumor/normal pair).

    I have a confirmed variant in the normal at position chr13:32912299 that is not called at all in the vcf output from MuTect2 but was called and identified as germline with the version 1 of MuTect.

    I first thought it might also be a read group problem so I used Picard to replace the read groups by the followings to be sure they are different between the 2 bams
    RGID="foo_normal"
    RGLB="bar_normal"
    RGPL="illumina"
    RGSM="Sample1_normal"
    RGPU="runID1"
    AND
    RGID="foo_tumor"
    RGLB="bar_tumor"
    RGPL="illumina"
    RGSM="Sample2_tumor"
    RGPU="runID2"

    My command is as such :
    java -jar GenomeAnalysisTK.jar -T MuTect2 -R ucsc.hg19.fasta --dbsnp dbsnp_138.hg19.vcf --cosmic Cosmic_hg19_v75_25112015.vcf --input_file:normal normal_with_RG.bam -L chr13:32912000-32913000 --input_file:tumor tumor_with_RG.bam --out test.vcf -bamout mutect_bam.bam

    And I attached an IGV screenshot of the bamout file which clearly shows that the variant exists.

    Thanks a lot for the suggestions.
    Manon

  • manon_sourdeixmanon_sourdeix FranceMember

    @Geraldine_VdAuwera and @Sheila should I be posting my question as a new discussion ? It seems that the problem is similar although I could not work it out.

    Thanks a lot for your help.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Next time maybe post as a new discussion; for this time it's fine, we'll address it here.

    Can you show what the position looks like for both tumor and normal?

  • manon_sourdeixmanon_sourdeix FranceMember

    No problem.

    I attached the IGV screenshots at that position for normal and tumor recalibrated bams (the ones I give in input to MuTect2).

    The variant seems to be present in both, It has been confirmed as a germline event. Oh and when I use HaplotypeCaller on the bamout it does call the variant so I can't figure out why it not called by MuTect2.

    Thanks a lot
    Manon

    Issue · Github
    by Sheila

    Issue Number
    686
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited March 2016

    Hi Manon, thanks for posting additional details. We need to consult the devs regarding this, it may take a few days to get you an answer.

    Post edited by Geraldine_VdAuwera on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @manon_sourdeix, we have verified that this is actually the expected behavior: we're not actually expecting MuTect to emit germline calls. It's only meant to emit calls that are either likely to be somatic or
    borderline/uncertain.

    Here's the more technical explanation given by the developer:

    Emission of a variant is determined in part by the normal LOD. That's the log-odds of the normal being homozygous-reference (homRef) vs. heterozygous, where higher normalLOD means the site is more definitely homRef. To be specific, the normalLOD has to exceed the INITIAL_NORMAL_LOD_THRESHOLD, which is 2 by default, in order to be output. This is similar to how the tumorLOD has to exceed the INITIAL_TUMOR_LOD_THRESHOLD, 4 by default, to be output at all. This is set up to allow for seeing one erroneous read in the normal as long as there are a bunch of reads supporting the reference to balance it out, to preserve sensitivity to real somatic calls.

  • manon_sourdeixmanon_sourdeix FranceMember

    Hi
    thank you for having had a look on my issue.
    Then what could be the solution to actually get those germline events, as for some patients, we cannot not study their constitutional DNA on its own ?
    Should I be changing the normal LOD threshold ? To which value you recommend ? Will it actually change something ?

    Or should I be doing it in 2 steps and use HC to call germline events on the bamout (or on the initial bam ?) as a second step ?
    Any recommendation about that ?

    Thanks a lot
    Manon

    P.S : MuTect v1 did call these germline variants and annotated them as "germline_risk"

  • manon_sourdeixmanon_sourdeix FranceMember

    Hi @Geraldine_VdAuwera, I actually have a more precise question/issue.

    When I run the same command without the normal bam, I expect to have this variant in the vcf, but I don't. Is MuTecT not outputing the variants with a frequency of 0/5, considering them as normal ?? But when you do not have the normal bam, you cannot know so it should be outputed shouldn't it ?

    Issue · Github
    by Sheila

    Issue Number
    697
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I see you're using a dbsnp file. Have you checked whether this variant has been reported previously in dbsnp? If it has, that could penalize the variant lod score enough to exclude it.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Oh I just saw your previous comment. Yes, you would have to run HaplotypeCaller separately on the initial bam file. MuTect is just not designed to make germline calls, and its genotyping model is not appropriate for that purpose. Even if the original MuTect produced calls for such sites, they would not have been appropriate for studying a person's germline constitution.

  • manon_sourdeixmanon_sourdeix FranceMember

    Hi thank you @Geraldine_VdAuwera for your answers. I do not intend to study the person's germline constitution. But if that woman has a mutation in BRCA1 in the somatic tissue I want to know it, even if it looks like a germline variant.

    In fact when I launch without the normal bam and without dbsnp, the variant does not appear at all. Is that because of its frequency ? Is MuTect in tumorOnly mode supposed to discard completely variants looking germline based on frequency ?

    I just want to be sure MuTect is behaving the way it should be. I thought, by reading the docs, that such variants would be called BUT filtered mentioning " germline_risk". Am I wring ? Is this variant too obviously germline to be called and annotated germline_risk then ?

    FILTER=<ID=germline_risk,Description="Evidence indicates this site is germline, not somatic">

    Thanks again for the explanations !

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @manon_sourdeix
    Hi Manon,

    Yes, the variant is not getting called most probably because the frequency is high, which is not expected in the somatic context. We think it's doing the right thing, but we'll have a look to make sure. Instructions for submitting test files are here.

    Thanks,
    Sheila

Sign In or Register to comment.