bug in MutSigCV_1.4, 1.3; fixed


There's a bug in MutSigCV_1.4, where if you use TCGA format Tumor_Sample_Barcodes in the coverage file, the program dies with the error
'some patients in mutation_file are not accounted for in coverage_file'
This is due to the removal of non-alphanumeric plus underscore characters from the Tumor_Sample_Barcodes in the function load_struct(). This prevents a valid comparison of Tumor_Sample_Barcodes in the maf file and the coverage file, as this function is not used in the processing of the Tumor_Sample_Barcodes in the maf file.

This bug is also present in the MutSigCV_1.3 version available for download, as well as in the version which runs on the GenePattern Server.

Below is a patch which converts hyphens to underscores before the non-alphanumeric plus underscore character removal, allowing the comparison of Tumor_Sample_Barcodes.

 --- MutSigCV.m_orig    2015-01-15 17:55:06.871166000 -0800
+++ MutSigCV.m  2015-01-15 17:56:24.978109000 -0800
@@ -1518,6 +1518,7 @@
   % and convert to list of unique field names

   if P.lowercase_fieldnames, fields = lower(fields); end
+  fields = regexprep(fields, '-', '_');  % convert hyphens to underscores
   fields = regexprep(fields, '\W','');   % remove any characters except A-Z, a-z, 0-9, underscore
   fields_orig = fields;
   fields = genvarname(fields_orig);

Best regards,

Chuck Connolly



  • rontonronton USAMember

    Dear Chuck,

    It seems you have some experience with MutSig. I want to run MutSig, and I am a bit unsure about the input files.

    Have you tried running MutSig on data derived from MuTect?

    Is it possible to convert the MuTect output, either .vcf or .txt into a .maf for MutSig?

    The coverage table (I have this as intervals.txt) and covariates table inputs have universal options if I understand correctly, but maybe some of that information is also in the MuTect output?

    Any advice is appreciated, thank you.

Sign In or Register to comment.