Duplicate mutation in oncotator output
The following steps are used in mutation calling with matched normal:
1. MuTect if used for mutation call, generating VCF output. The command line is:
java -Xmx8g -jar XXXX.jar -T MuTect -R GRCh37.fa --dbsnp dbsnp_138.b37.vcf --cosmic b37_CosmicCodingMuts_v70.vcf --cosmic b37_CosmicNonCodingVariants_v70.vcf --tumor_sample_name TUMOR --input_file:tumor tumor.bam --out call_stats.txt --coverage_file coverage.wig.txt --vcf mutect.out.vcf --normal_sample_name NORMAL --input_file:normal normal.bam
A couple of data line in the VCF is here:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
1 10231 rs200279319 C A . REJECT DB GT:AD:BQ:DP:FA 0:2,0:.:1:0.00 0/1:0,1:33:1:1.00
1 10419 . T G . REJECT . GT:AD:BQ:DP:FA 0:11,3:.:8:0.214 0/1:3,0:.:3:0.00
1 10425 . T G . REJECT . GT:AD:BQ:DP:FA 0:7,5:.:8:0.417 0/1:2,0:.:2:0.00
2. Oncotator is used for annotate the vcf file generated by MuTect. THe command line is:
oncotator --input_format=VCF --db-dir oncotator_v1_ds_Jan262015 -c tx_exact_uniprot_matches.AKT1_CRLF2_FGFR1.txt mutect.out.vcf oncotator.maf.txt hg19
A couple of data line in the VCF is here: (I only show first couple of columns):
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_position End_position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 Tumor_Sample_Barcode Matched_Norm_Sample_Barcode
DDX11L1 100287102 __UNKNOWN__ __UNKNOWN__ 1 10231 10231 __UNKNOWN__ RNA SNP C C A rs200462216|rs200279319|rs376846324 __UNKNOWN__ NORMAL
DDX11L1 100287102 __UNKNOWN__ __UNKNOWN__ 1 10231 10231 __UNKNOWN__ RNA SNP C C A rs200462216|rs200279319|rs376846324 __UNKNOWN__ TUMOR
For each mutation in the VCF from MuTect, ex, chr1 10231, it has two records in the Oncotator output: one record with the filed "Matched_Norm_Sample_Barcode" = NORMAL, the other record with the field "Matched_Norm_Sample_Barcode" = TUMOR. Also, for both records, the field "Tumor_Sample_Barcode" is set to be "__UNKNOWN__".
1. What does this mean? Can I have the correct content for "Tumor_Sample_Barcode" and "Matched_Norm_Sample_Barcode"?
2. For the file above generated by Oncotor, I only need the Somatic mutation in tumor sample comparing to the normal sample. Unfortunately, the one with "Matched_Norm_Sample_Barcode" = TUMOR is the right one, with t_alt_count equals to the number of reads covering this mutation base in the tumor sample. It's very confusing. How to make it right?