To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

difference between Mutect 1.1.7 and MuTect2

rdali094rdali094 McGill UniversityMember

I am trying to detect somatic Snps in a normal/tumour pair. I tried MuTect2 under Gatk (java -jar GenomeAnalysisTK.jar -T MuTect2) and varscan 2. To my surprise, there was very little overlap between the predictions ( < 1%). In a paper, varscan2 and mutect had more in common but the paper had tested the stand alone mutect (mutect 1.1.4). I tried that but it wasnt compatible with my java7 environment so I tried stand alone mutect 1.1.7 instead (java -Xmx2g -jar mutect-1.1.7.jar --analysis_type MuTect).
To my surprise, there is still a big difference between the "Pass" predictions of Mutect2 and the "Keep" predictions of mutect 1.1.7 which are also very different from varscan2... Is this normal? What do I trust?
I certainly expected more overlap...


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    How did you do the comparison? This could be a variant representation issue.

  • rdali094rdali094 McGill UniversityMember

    I am only looking at a single chromosome. So I intersected the position by cutting out column2 and looking for overlaps.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    What is "column2"? How do you "look for overlaps"? Try to imagine that I can't read your mind and don't know what you are assuming that I might know about your methods, file formats, the programs you're using, etc.

  • rdali094rdali094 McGill UniversityMember

    Yes of course. Sorry.

    I am only looking at a single chromosome. So I am using the SNP position along the chromosome to check if the same SNPs are called by both tools.

    grep -v REJECT out.muTect1.7.vcf | cut -f2

    results in the positions of SNPs along my chromosome that were called by mutect 1.7 and that passed the filters.

    I do the same with the output of MuTect2:

    grep "PASS" out.muTect2.vcf | cut -f2

    results in the positions of SNPs that were called by mutect2 and that passed filters.

    Then I just check the 2 outputs for overlaps. I use for a quick idea.

    Am I doing something wrong?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I see, thanks for clarifying. None of that is wrong in principle but in practice it's a very low-information way of looking at variants. I would recommend instead using a tool like GATK's VariantEval, for which we have both a method document and an applied tutorial document.

    Keep in mind also that MuTect2 calls indels at the same time as SNPs, whereas the other two don't, but you don't seem to be accounting for that.

    I would also recommend including variants that failed filters in your evaluation. What you are seeing could be the result of the different filtering strategies applied by the different tools. You need to distinguish the case of whether one tool is not at all calling a variant when another is, vs calling it initially then filtering it out.

    I'd also recommend choosing a subset of the calls and looking at them relative to the sequence data in IGV to get a sense of whether the calls that are made by one tool but not by another look reasonable.

    In short, don't trust any raw numbers.

  • i have the same question. It's very little intersection between mutect1 and mutect2, any suggestion?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    @dtcdtcdtcdtc At this time we're not able to comment on differences between M1 and M2. MuTect2 is still in beta phase and is being actively worked on. We expect to have an improved version out by the end of the quarter (in a couple of months).

  • @Geraldine_VdAuwera thanks for reply. I also found that the raw output of mutect2(raw.vcf) has less variants evaluated than the mutect1 's raw output(raw_call_stats.out) for the same data. Can you explain it?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    No, see my comment above.
Sign In or Register to comment.