bug in MutSigCV_1.4, 1.3; fixed
There's a bug in MutSigCV_1.4, where if you use TCGA format Tumor_Sample_Barcodes in the coverage file, the program dies with the error
'some patients in mutation_file are not accounted for in coverage_file'
This is due to the removal of non-alphanumeric plus underscore characters from the Tumor_Sample_Barcodes in the function load_struct(). This prevents a valid comparison of Tumor_Sample_Barcodes in the maf file and the coverage file, as this function is not used in the processing of the Tumor_Sample_Barcodes in the maf file.
This bug is also present in the MutSigCV_1.3 version available for download, as well as in the version which runs on the GenePattern Server.
Below is a patch which converts hyphens to underscores before the non-alphanumeric plus underscore character removal, allowing the comparison of Tumor_Sample_Barcodes.
--- MutSigCV.m_orig 2015-01-15 17:55:06.871166000 -0800 +++ MutSigCV.m 2015-01-15 17:56:24.978109000 -0800 @@ -1518,6 +1518,7 @@ % and convert to list of unique field names if P.lowercase_fieldnames, fields = lower(fields); end + fields = regexprep(fields, '-', '_'); % convert hyphens to underscores fields = regexprep(fields, '\W',''); % remove any characters except A-Z, a-z, 0-9, underscore fields_orig = fields; fields = genvarname(fields_orig);