Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

Reduce mutect running time

ariehtalariehtal AmsterdamMember
edited May 2015 in MuTect v1

I'm running mutect-1.1.7.jar using java 1.7 on the dutch-lsgrid and it runs longer than the limit imposed of 72 hours wall time as it cannot run in parallel.
How can I reduce the running time by using muTect options? I'm already checking about using a faster CPU.
This is the command line that I'm using:
./jre1.7.0_76/bin/java -Xmx2g -jar mutect-1.1.7.jar -T MuTect -R Homo_sapiens_assembly19.fasta --dbsnp dbsnp_13
2_b37.leftAligned.vcf --cosmic b37_cosmic_v54_120711.vcf --input_file:normal $ATAL_DIR/$NORMALPATH --input_fil
e:tumor $ATAL_DIR/$TUMORPATH --out $MUTECT_OUT

Thanks, Arieh Tal
[email protected]
Dutch Cancer Institute, NKI
Amsterdam, Holland

Post edited by Geraldine_VdAuwera on
Tagged:

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    The best way to deal with this problem is to parallelize your analysis. Try running mutect per-chromosome using the -L argument. If that is still too long you can split up the job into smaller intervals.

  • ariehtalariehtal AmsterdamMember

    Thanks Geraldine, I'll try it. best regards, Arieh

  • LienLien LeuvenMember

    Dear Geraldine,

    I have a question regarding this parallelizing. I ran MuTect2 (GATK version3.6) on the entire BAM-file (from a capture experiment). For the -L argument, I entered the captured region plus 100 bases on each side.
    For your information, this is the command line I used:
    /jdk/1.8.0/bin/java -jar /gatk/3.6/GenomeAnalysisTK.jar -T MuTect2
    -R /hg38b/genome.fa
    --cosmic /hg38b/CosmicCodingMuts_v77_withChr_vcfsorter.vcf
    --dbsnp /hg38b/dbsnp147_All_20160407.vcf
    -L /hg38b.analysis.variant_calling.target.bed
    -I:tumor sample.bam --normal_panel /hg38b/hg38b.normal_panel.vcf
    -o /sample_Mutect2_hg38_chr2.vcf

    Because I have a lot of samples to analyze, I also tried running MuTect2 but with the -L argument giving only the captured region plus 100 bases on chr1. In a next command for chr2, and so on. Later, I merged all the files into one vcf.
    However, I noticed that there is no complete overlap between both 'methods'. For example, I found a mutation that 'passed' the MuTect2 criteria when I ran MuTect on all chromosomes together, while this didn't pass when running chromosome per chromosome because of 'clustered events, homologous mapping event'. This mutation had 743 reads, of which 254 contained the variant allele.
    For one sample, I have a few similar mutations.
    Is this because of some statistics that change when working per chromosome? Also, would it be ok if we continue to do the analyses per chromosome?
    Before starting, I wouldn't have guessed that there could be a difference in the output. And now I'm not sure which 'method' yields the 'best' results.
    Many thanks for your thoughts on this.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    No, there shouldn't be any difference between the two ways you described. The statistics should be the same. Can you show a few examples of the calls that were made in one case but not the other?
  • LienLien LeuvenMember
    edited August 2016

    One example that was found when the analysis was run per chromosome:

    • chr 20:58903718. 1730 reads of which 10 reads contain the alternative allele (6 on + strand, 4 on - strand).

    A few examples that were found when the analysis was run on all chromosomes together:

    • chr2:29225494. 743 reads of which 254 reads contain the alternative allele (125 on + strand, 129 on - strand).
      This one is also found by the analysis per chromosome, but not as 'pass' but as 'clustered event, homologous mapping event'.

    • chr16:3769261. 1004 reads of which 622 reads contain the alternative allele (308 on + strand, 314 on - strand).
      This one is also found by the analysis per chromosome, but not as 'pass' but as 'clustered event'.

    • chr20:58840500. 1573 reads if which 9 reads contain the alternative allele (5 on + strand, 4 on - strand).
      This one is not found by the analysis per chromosome.

    The detailed MuTect2 output information is attached.

Sign In or Register to comment.