To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Reduce mutect running time

ariehtalariehtal AmsterdamMember
edited May 2015 in MuTect v1

I'm running mutect-1.1.7.jar using java 1.7 on the dutch-lsgrid and it runs longer than the limit imposed of 72 hours wall time as it cannot run in parallel.
How can I reduce the running time by using muTect options? I'm already checking about using a faster CPU.
This is the command line that I'm using:
./jre1.7.0_76/bin/java -Xmx2g -jar mutect-1.1.7.jar -T MuTect -R Homo_sapiens_assembly19.fasta --dbsnp dbsnp_13
2_b37.leftAligned.vcf --cosmic b37_cosmic_v54_120711.vcf --input_file:normal $ATAL_DIR/$NORMALPATH --input_fil
e:tumor $ATAL_DIR/$TUMORPATH --out $MUTECT_OUT

Thanks, Arieh Tal
a.tal@nki.nl
Dutch Cancer Institute, NKI
Amsterdam, Holland

Post edited by Geraldine_VdAuwera on
Tagged:

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    The best way to deal with this problem is to parallelize your analysis. Try running mutect per-chromosome using the -L argument. If that is still too long you can split up the job into smaller intervals.

  • ariehtalariehtal AmsterdamMember

    Thanks Geraldine, I'll try it. best regards, Arieh

  • LienLien LeuvenMember

    Dear Geraldine,

    I have a question regarding this parallelizing. I ran MuTect2 (GATK version3.6) on the entire BAM-file (from a capture experiment). For the -L argument, I entered the captured region plus 100 bases on each side.
    For your information, this is the command line I used:
    /jdk/1.8.0/bin/java -jar /gatk/3.6/GenomeAnalysisTK.jar -T MuTect2
    -R /hg38b/genome.fa
    --cosmic /hg38b/CosmicCodingMuts_v77_withChr_vcfsorter.vcf
    --dbsnp /hg38b/dbsnp147_All_20160407.vcf
    -L /hg38b.analysis.variant_calling.target.bed
    -I:tumor sample.bam --normal_panel /hg38b/hg38b.normal_panel.vcf
    -o /sample_Mutect2_hg38_chr2.vcf

    Because I have a lot of samples to analyze, I also tried running MuTect2 but with the -L argument giving only the captured region plus 100 bases on chr1. In a next command for chr2, and so on. Later, I merged all the files into one vcf.
    However, I noticed that there is no complete overlap between both 'methods'. For example, I found a mutation that 'passed' the MuTect2 criteria when I ran MuTect on all chromosomes together, while this didn't pass when running chromosome per chromosome because of 'clustered events, homologous mapping event'. This mutation had 743 reads, of which 254 contained the variant allele.
    For one sample, I have a few similar mutations.
    Is this because of some statistics that change when working per chromosome? Also, would it be ok if we continue to do the analyses per chromosome?
    Before starting, I wouldn't have guessed that there could be a difference in the output. And now I'm not sure which 'method' yields the 'best' results.
    Many thanks for your thoughts on this.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    No, there shouldn't be any difference between the two ways you described. The statistics should be the same. Can you show a few examples of the calls that were made in one case but not the other?
  • LienLien LeuvenMember
    edited August 2016

    One example that was found when the analysis was run per chromosome:

    • chr 20:58903718. 1730 reads of which 10 reads contain the alternative allele (6 on + strand, 4 on - strand).

    A few examples that were found when the analysis was run on all chromosomes together:

    • chr2:29225494. 743 reads of which 254 reads contain the alternative allele (125 on + strand, 129 on - strand).
      This one is also found by the analysis per chromosome, but not as 'pass' but as 'clustered event, homologous mapping event'.

    • chr16:3769261. 1004 reads of which 622 reads contain the alternative allele (308 on + strand, 314 on - strand).
      This one is also found by the analysis per chromosome, but not as 'pass' but as 'clustered event'.

    • chr20:58840500. 1573 reads if which 9 reads contain the alternative allele (5 on + strand, 4 on - strand).
      This one is not found by the analysis per chromosome.

    The detailed MuTect2 output information is attached.

Sign In or Register to comment.