We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

High depth - tumor-only variant calling with mutect2


I'm trying to call somatic variants (snv and indels) on targeted sequencings (usually from amplicon-based enrichment). Using Mutect1 seems to work very well, but MuTect2 is proving more difficult with reported frequencies often over-evaluated compared to IGV, high false positive and false negative rates.

I have seen threads about some parameters for deep(er) sequencings or tumor-only calling or amplicon panels, but I'm wondering if there is a set of "best-practices" and parameter values that could be beneficial in this particular setting.

The sequencing depths span from ~500 to 10-20000 reads and the breadth of sequencing ranges 10 to 100 Kb. In amplicon data, some positions in each amplicon are covered only by one read-orientation (extremities of amplicons) and others by both, with a short transition. There are no normal samples.

(currently, I'm using mutect2 on gatk3 but information on gatk4 would of course be welcome too)

Thanks and good day,


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Thomas,

    You can try with GATK4 and let us know how it performs. The developers are focusing all efforts on GATK4 now, so any issues in GATK3 will not be fixed.

    I may need you to submit a bug report, as the team is trying to improve the reassembly algorithm, and your test case may help.


  • thomasdelcourtthomasdelcourt bxlMember

    Hi Sheila,

    Thank you for your answer.
    I'll try to get some time to test this in gatk4 soon.

    My question above was more on a general note than a specific bug or issue. I was wondering if there is any general information or material on this kind of application of mutect2, e.g. some parameters which could have an impact or some of the pre-processing steps to tweak in the setting I'm using it for.
    The main caracteristics of the sequencings (that could change from the way it is presented in the best practices) are the high depth and rather short length of sequencing, and the very neat depths transitions due to the amplicons.


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Thomas,

    The issue is that Mutect2 reassembly seems to not do a great job at the ends of reads. The team is working on improving this, but as of right now, there are no arguments that can help you out. If you get a chance to prepare some test files, you can submit them using instructions here.


  • escaonescaon Limoges, FranceMember

    Hi all,
    Our team is also very interested in guidelines for applying Mutect2 to amplicon sequencing data. Any tips are welcomed.
    Best regards

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Have you found any odd cases when using Mutect2 on amplicon data? We don't have much experience working with amplicon data, but perhaps some of the other users who have experience will jump in here.


  • Hello,

    we are working with Mutect2 to call somatic variants (snvs and indels) on targeted sequencing too. We are currently using GATK version.
    Our problem is related with the identification of same indels, on the way that we are able to detect them, but not in the properly depth statistics. In fact, these mutations are heterozigous, but we only obtain the reads count which have the mutation, and no the ones which not, so the depth and the allele frequency are affected.

    VCF result:
    chr7 55181317 . G GACAACCCCC . . DP=968;ECNT=5;POP_AF=5.000e-08;RPA=1,2;RU=ACAACCCCC;STR;TLOD=4345.54 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:1,967:0.998:968:1,967:0,0:23,23:144,150:60:54:0:0.990,0.990,0.999:0.025,0.030,0.945

    We now the correct mutation frequency we have to detect and we can see in IGV the real depth as we show in the images we add.

    Also, we post the commands we are using:
    gatk- "-Xmx30g" BaseRecalibrator -I processed.bam -R ~/Homo_sapiens.GRCh38.fa --intervals ~/capture.bed --known-sites ~/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites /All_20180418.vcf.gz --output p.table

    gatk- --java-options "-Xmx30g" ApplyBQSR -R ~/Homo_sapiens.GRCh38.fa -I processed.bam --bqsr-recal-file p.table -O post_p.bam

    gatk- --java-options -Xmx30g Mutect2 --native-pair-hmm-threads 20 -R ~/Homo_sapiens.GRCh38.fa -I post_p.bam -L ~/targeted.bed -tumor name -O ~/result.vcf --max-reads-per-alignment-start 0 -min-base-quality-score 1 --base-quality-score-threshold 6

    We really appreciate your help if you help us to understand why we are getting these results.

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Also, we have improved a lot of things related to indels, especially indels in STRs, since so you might want to try running with the most recent release. If it's still strange feel free to post an IGV screenshot of the original bam and the M2 bamout.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    @jmartin_incliva @escaon @thomasdelcourt Also, for high-depth calling I strongly recommend switching to the latest Mutect2 release. We recently made some big improvements specifically for high-depth calling. The intent was to support mitochondrial calling but we have found it also improves somatic calling at all depths.

Sign In or Register to comment.