Optimizing Mutect2 runs on whole genomes?

khadi_NYGCkhadi_NYGC New York Genome CenterMember

Dear GATK,

Given the most current optimized way to run Mutect2 on whole genomes of about 40-60X coverage (~300 G) , how long can I expect it to run on one whole genome sequence? Particularly, what would be the most optimal parameters or practices that you have for generating panel of normals from Mutect2?

I am running Mutect2 on several whole genomes to generate a PON with the multi-thread option -nct 3 per BAM. As of 7 days since starting this job, the run has only completed calls on chromosomes 1 and 2 for one whole genome BAM.

I plan on restarting the run using a scatter-and-gather approach and just split a Mutect2 job on one whole genome into some number of intervals. From my search on the forums, this seems to be the consensus of how best to run Mutect2. However, I would really appreciate any other recommendations.



  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Kevin, scatter-gather is indeed our method of choice for this. Note that while the implementation of MuTect2 in GATK 3.* is very slow, the development team is currently making a big push to accelerate MuTect2 in the GATK4 framework, which will be released into beta status in about a month and general release probably in late June.

