The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# VCF output

Tokyo, JapanMember

Hi

Are there any plans for MuTect to output its results as a VCF file? Like many other bioinformaticians, a lot of my current tool-chain is built around VCFs output by the GATK, including the critical annotation steps.

I'm really interested in using MuTect, but the lack of VCF output represents a not-insignificant hurdle to overcome, even in comparing the calls made by MuTect against other callers. I'll happily write some scripts to parse MuTect output into VCF, but thought I should ask first if I was being naive or if MuTect->VCF scripts already existed.

Regards
Chris

• Tokyo, JapanMember

Thanks for your time; apologies for my apparent inability to read the documentation - I will get back to your paper and run some tests.

• Member

What if I used older version of MuTect and if possible I don't want to run MuTect again (it takes 2 days for one sample)? Is there any tool that can convert MuTect plain output into .VCF or .MUT format?

We don't have a tool that does this post-hoc conversion. However, you could run MuTect specifying just the loci from the original output, which should run much faster, in order to get VCF output in less time.

• Member

Hi

I am unable to generate output in vcf format. Can you specify the exact command to generate vcf output.

Below is the command I am using

java -Xmx2g -jar muTect-1.0.27783.jar
--analysis_type MuTect
--reference_sequence hg19.fasta
--dbsnp dbsnp_132_b37.leftAligned.vcf
--cosmic hg19_cosmic_v54_120711.vcf
--input_file:normal Normal.bam
--input_file:tumor Tumor.bam
--out example.call_stats.txt
--coverage_file example.coverage.wig.txt

• Member

java -Xmx2g -jar muTect-1.0.27783.jar --analysis_type MuTect --reference_sequence hg19.fasta --dbsnp dbsnp_132_b37.leftAligned.vcf --cosmic hg19_cosmic_v54_120711.vcf --input_file:normal Normal.bam --input_file:tumor Tumor.bam --out example.call_stats.txt --coverage_file example.coverage.wig.txt --vcf out.vcf

Can you provide some more details? Does everything run properly but there is no "out.vcf" or is there some sort of error/warning message along the way? As written above.

You are using an older version of MuTect however, can you try the version released with the publication?

• Member

Sorry, my post was reply to suz_katie'e post who wanted to see exact command for VCF output. VCF output is working fine for me.

• MilanMember
edited September 2014

@kcibul I used Mutect for my normal/tumor pair samples(exome) with the bam files which are preprocessed by GATK for BQSR and those recalibrated bam files are used for Mutect downstream for somatic mutation identification. To note during processing of bam in GATK I have already used target bed files provided by the company which are used for target enrichment during the library preparation process. Still I wanted to see if I use those target bed files again with Mutect do I get mutations with better annotation on exomes or not. When I run mutect without the --intervals option i.e not specifying the Target enrichment bed files I get variants which around 1800 which are having the flag KEEP but when use the option --intervals with MuTect 1.1.4 with I do not get any variants with flag KEEP and COVERED. Is it not a right way to use the target bed file of the company while running the MuTect on GATK processed bam files? The command line I used is below

java -Xmx14g -jar /scratch/GT/softwares/mutect/muTect-1.1.4.jar --analysis_type MuTect --reference_sequence /scratch/GT/vdas/test_exome/exome/hg19.fa --cosmic /data/PGP/exome/mutect/hg19/hg19_cosmic_v54_120711.vcf --dbsnp /scratch/GT/vdas/test_exome/exome/databases/dbsnp_137.hg19.vcf --input_file:normal /scratch/GT/vdas/pietro/exome_seq/results/N_S8981/N_S8981.realigned.recal.bam --input_file:tumor /scratch/GT/vdas/pietro/exome_seq/results/T_S7999/T_S7999.realigned.recal.bam --out /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/mutect_S_333soma_t.txt --coverage_file /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/LG.coverage.wig.txt --vcf /scratch/GT/vdas/pietro/exome_seq/results/mutect/param_test/mutect_S_333soma_t.vcf --intervals /scratch/GT/vdas/referenceBed/hg19/ss_v4/SureSelect_XT_Human_All_Exon_V4.bed --fraction_contamination 0.5

• MilanMember

@kcibul I would like to have some expert advice or suggestions from you regarding this matter. It seems that using the interval file is not the problem, I had re run the analysis again without using the interval list of bed file that comes with the target kit, but I kept the tumor contamination fraction as 50% or 0.5 . I get only 2 somatic variants. I have also checked the scenario with VarScan at both 25% and 50% purity and I found very few variants which were high confident. I retrieved around 180 odd variants with VarScan out of which only 8-9 were on the exons. This might be an indication that the variants might not be ideally true. But I would like to have some input from you who already have tried exome data analysis for somatic variants with less than or equal to 50% pure line for normal/tumor pairs for fishing out mutations(somatic).

Hi -- I think this is actually just a misunderstanding. The parameter "--fraction_contamination" is not for stromal contamination (ie normal tissue contaminating the tumor, also referred to as purity). This is for cross-patient contamination (ie Patient 1's DNA is in Patient's 2 DNA sample). This leads to false positive mutation calls as all of Patient 1's private SNPs are seen as low level somatic events. Setting this parameter raises the threshold for calling events, thus improving specificity at the cost of sensitivity.

You shouldn't be setting this parameter unless you are calculating the cross-patient contamination level via a method like ContEst or VerifyBAMId, or else you will see the reduced sensitivity you have observed

• MilanMember

so then is there any way to counter stromal contamination in tumor with the variant calling of Mutect?

• Member

I ran MuTect outputting both the VCF and the call_stats files and am not sure how this is working, since the variants that I get as "Keep/Pass" in either case do not match at all. What is the difference in the two output files?

I ran Mutect as follows:
java -jar mutect-1.1.7.jar -T MuTect -R hg19.fa -I:normal normal.recal.bam -I:tumor tumor.recal.bam --dbsnp dbsnp_138.hg19.vcf --cosmic cosmic_hg19_liftover.vcf -L ATGATK.bed -o test.call_stats.out --coverage_file test.wig.txt --vcf test.mutect.out.vcf&

And, if I look for the "KEEP" variants in test.call_stats.out I get:
chr5 176519340 GGCxCAT C A tumor normal 0 NOVEL COVERED 0.739671 0.739671 1 1 0.999992 71 0 1.611127 6.654147 6.793853 0 0.096774 0.02 -0.280506 30 28 3 911 95 60 60 0 0 CC 11.737115 0 39 39 0 1303 0 0.391068 0.859155 (25,5,3,0) 87 0 12 0 0 KEEP

And, the "PASS" variants in the test.mutect.out.vcf is as follows:

# CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor normal

chr5 67569259 . G T . PASS SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:245,675:27:920:0.734:2 0:931,0:.:933:0.00:0
chr17 7578263 . G A . PASS SOMATIC;VT=SNP GT:AD:BQ:DP:FA:SS 0/1:197,656:30:854:0.769:2 0:917,0:.:919:0.00:0

• Member