We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

VCF output

cwardellcwardell Tokyo, JapanMember


Are there any plans for MuTect to output its results as a VCF file? Like many other bioinformaticians, a lot of my current tool-chain is built around VCFs output by the GATK, including the critical annotation steps.

I'm really interested in using MuTect, but the lack of VCF output represents a not-insignificant hurdle to overcome, even in comparing the calls made by MuTect against other callers. I'll happily write some scripts to parse MuTect output into VCF, but thought I should ask first if I was being naive or if MuTect->VCF scripts already existed.


Best Answer


  • cwardellcwardell Tokyo, JapanMember

    Thanks for your time; apologies for my apparent inability to read the documentation - I will get back to your paper and run some tests.

  • jeremywujeremywu Member

    What if I used older version of MuTect and if possible I don't want to run MuTect again (it takes 2 days for one sample)? Is there any tool that can convert MuTect plain output into .VCF or .MUT format?

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    We don't have a tool that does this post-hoc conversion. However, you could run MuTect specifying just the loci from the original output, which should run much faster, in order to get VCF output in less time.

  • suz_katiesuz_katie Member


    I am unable to generate output in vcf format. Can you specify the exact command to generate vcf output.

    Below is the command I am using

    java -Xmx2g -jar muTect-1.0.27783.jar
    --analysis_type MuTect
    --reference_sequence hg19.fasta
    --dbsnp dbsnp_132_b37.leftAligned.vcf
    --cosmic hg19_cosmic_v54_120711.vcf
    --input_file:normal Normal.bam
    --input_file:tumor Tumor.bam
    --out example.call_stats.txt
    --coverage_file example.coverage.wig.txt

  • vyellapavyellapa Member

    java -Xmx2g -jar muTect-1.0.27783.jar --analysis_type MuTect --reference_sequence hg19.fasta --dbsnp dbsnp_132_b37.leftAligned.vcf --cosmic hg19_cosmic_v54_120711.vcf --input_file:normal Normal.bam --input_file:tumor Tumor.bam --out example.call_stats.txt --coverage_file example.coverage.wig.txt --vcf out.vcf

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    Can you provide some more details? Does everything run properly but there is no "out.vcf" or is there some sort of error/warning message along the way? As written above.

    You are using an older version of MuTect however, can you try the version released with the publication?

  • vyellapavyellapa Member

    Sorry, my post was reply to suz_katie'e post who wanted to see exact command for VCF output. VCF output is working fine for me.

  • vivekdas_1987vivekdas_1987 MilanMember
    edited September 2014

    @kcibul I used Mutect for my normal/tumor pair samples(exome) with the bam files which are preprocessed by GATK for BQSR and those recalibrated bam files are used for Mutect downstream for somatic mutation identification. To note during processing of bam in GATK I have already used target bed files provided by the company which are used for target enrichment during the library preparation process. Still I wanted to see if I use those target bed files again with Mutect do I get mutations with better annotation on exomes or not. When I run mutect without the --intervals option i.e not specifying the Target enrichment bed files I get variants which around 1800 which are having the flag KEEP but when use the option --intervals with MuTect 1.1.4 with I do not get any variants with flag KEEP and COVERED. Is it not a right way to use the target bed file of the company while running the MuTect on GATK processed bam files? The command line I used is below

    java -Xmx14g -jar /scratch/GT/softwares/mutect/muTect-1.1.4.jar --analysis_type MuTect --reference_sequence /scratch/GT/vdas/test_exome/exome/hg19.fa --cosmic /data/PGP/exome/mutect/hg19/hg19_cosmic_v54_120711.vcf --dbsnp /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf --input_file:normal /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.bam --input_file:tumor /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.bam --out /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/mutect_S_333soma_t.txt --coverage_file /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/LG.coverage.wig.txt --vcf /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/mutect_S_333soma_t.vcf --intervals /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed --fraction_contamination 0.5

  • vivekdas_1987vivekdas_1987 MilanMember

    @kcibul I would like to have some expert advice or suggestions from you regarding this matter. It seems that using the interval file is not the problem, I had re run the analysis again without using the interval list of bed file that comes with the target kit, but I kept the tumor contamination fraction as 50% or 0.5 . I get only 2 somatic variants. I have also checked the scenario with VarScan at both 25% and 50% purity and I found very few variants which were high confident. I retrieved around 180 odd variants with VarScan out of which only 8-9 were on the exons. This might be an indication that the variants might not be ideally true. But I would like to have some input from you who already have tried exome data analysis for somatic variants with less than or equal to 50% pure line for normal/tumor pairs for fishing out mutations(somatic).

  • kcibulkcibul Cambridge, MAMember, Broadie, Dev ✭✭✭

    Hi -- I think this is actually just a misunderstanding. The parameter "--fraction_contamination" is not for stromal contamination (ie normal tissue contaminating the tumor, also referred to as purity). This is for cross-patient contamination (ie Patient 1's DNA is in Patient's 2 DNA sample). This leads to false positive mutation calls as all of Patient 1's private SNPs are seen as low level somatic events. Setting this parameter raises the threshold for calling events, thus improving specificity at the cost of sensitivity.

    You shouldn't be setting this parameter unless you are calculating the cross-patient contamination level via a method like ContEst or VerifyBAMId, or else you will see the reduced sensitivity you have observed

  • vivekdas_1987vivekdas_1987 MilanMember

    so then is there any way to counter stromal contamination in tumor with the variant calling of Mutect?

  • artitandonartitandon Member ✭✭

    I ran MuTect outputting both the VCF and the call_stats files and am not sure how this is working, since the variants that I get as "Keep/Pass" in either case do not match at all. What is the difference in the two output files?

    I ran Mutect as follows:
    java -jar mutect-1.1.7.jar -T MuTect -R hg19.fa -I:normal normal.recal.bam -I:tumor tumor.recal.bam --dbsnp dbsnp_138.hg19.vcf --cosmic cosmic_hg19_liftover.vcf -L ATGATK.bed -o test.call_stats.out --coverage_file test.wig.txt --vcf test.mutect.out.vcf&

    And, if I look for the "KEEP" variants in test.call_stats.out I get:
    chr5 176519340 GGCxCAT C A tumor normal 0 NOVEL COVERED 0.739671 0.739671 1 1 0.999992 71 0 1.611127 6.654147 6.793853 0 0.096774 0.02 -0.280506 30 28 3 911 95 60 60 0 0 CC 11.737115 0 39 39 0 1303 0 0.391068 0.859155 (25,5,3,0) 87 0 12 0 0 KEEP

    And, the "PASS" variants in the test.mutect.out.vcf is as follows:


    chr5 67569259 . G T . PASS SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:245,675:27:920:0.734:2 0:931,0:.:933:0.00:0
    chr17 7578263 . G A . PASS SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:197,656:30:854:0.769:2 0:917,0:.:919:0.00:0

  • artitandonartitandon Member ✭✭

    Sorry, please ignore my above question, I had made a mistake, the output is fine.

Sign In or Register to comment.