Recommendations for speeding up MuTect2 on deep capture data?

Hi GATK Team,

I'm trying to move a pipeline over from MuTect1 and Indelocator to MuTect2. The difference in runtime I'm seeing is much larger than I was expecting. I did see this post, and Sheila's answer, but I'd like to ask again with some more specifics.

I'm running on a pair of BAMs aligned with bwa-mem to hs38DH, and calling over a small set of target regions. The target regions are ~375 exons totaling about 57kb - a fairly tiny set of regions. The BAM files have a median coverage of around 500X over these target regions.

Running with MuTect 1 this takes less than a minute and a half. Running with MuTect2 this is taking nearly 30 minutes! I appreciate that MuTect2 is doing local assembly and indel calling, but a 20-fold increase in runtime doesn't seem right. For comparison, running the HaplotypeCaller on these same BAMs takes between 30 and 60 seconds. Again, not an apples-to-apples comparison, just a point of reference.

Any suggestions for why it's taking ~20 times longer to run, and how I might decrease that without tanking sensitivity? Thanks,

-t

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tfenne
    Hi t,

    Hmm. You are right that is a major difference in time. Can you please post the exact command you ran?

    Thanks,
    Sheila

  • tfennetfenne USAMember

    Sure, though it's pretty vanilla. The last three parameters are just re-stating the defaults FWIW:

    java -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx4096m \
      -jar /pipeline/packages/GenomeAnalysisTK.jar \
      -T MuTect2 \
      -R /pipeline/ref/hs38DH/hs38DH.fa \
      -L /pipeline/ref/57k_panel/targets.b38.interval_list \
      -I:tumor tumor.bam \
      -I:normal normal.bam \
      -o mutect2.vcf \
      --max_alt_allele_in_normal_fraction 0.03 \
      --max_alt_alleles_in_normal_count 2 \
      --max_alt_alleles_in_normal_qscore_sum 20
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @tfenne
    Hi,

    Thanks. Unfortunately, we don't have any expectations right now on runtime performance, so I don't think we can do anything for you. However, you can try running with debug-level logging and see if progress is slow at any particular interval or if it's slow throughout.

    -Sheila

Sign In or Register to comment.