Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

bug in MutSigCV_1.4, 1.3; fixed

Hi,

There's a bug in MutSigCV_1.4, where if you use TCGA format Tumor_Sample_Barcodes in the coverage file, the program dies with the error
'some patients in mutation_file are not accounted for in coverage_file'
This is due to the removal of non-alphanumeric plus underscore characters from the Tumor_Sample_Barcodes in the function load_struct(). This prevents a valid comparison of Tumor_Sample_Barcodes in the maf file and the coverage file, as this function is not used in the processing of the Tumor_Sample_Barcodes in the maf file.

This bug is also present in the MutSigCV_1.3 version available for download, as well as in the version which runs on the GenePattern Server.

Below is a patch which converts hyphens to underscores before the non-alphanumeric plus underscore character removal, allowing the comparison of Tumor_Sample_Barcodes.

 --- MutSigCV.m_orig    2015-01-15 17:55:06.871166000 -0800
+++ MutSigCV.m  2015-01-15 17:56:24.978109000 -0800
@@ -1518,6 +1518,7 @@
   % and convert to list of unique field names

   if P.lowercase_fieldnames, fields = lower(fields); end
+  fields = regexprep(fields, '-', '_');  % convert hyphens to underscores
   fields = regexprep(fields, '\W','');   % remove any characters except A-Z, a-z, 0-9, underscore
   fields_orig = fields;
   fields = genvarname(fields_orig);

Best regards,

Chuck Connolly

Tagged:

Comments

  • rontonronton USAMember

    Dear Chuck,

    It seems you have some experience with MutSig. I want to run MutSig, and I am a bit unsure about the input files.

    Have you tried running MutSig on data derived from MuTect?

    Is it possible to convert the MuTect output, either .vcf or .txt into a .maf for MutSig?

    The coverage table (I have this as intervals.txt) and covariates table inputs have universal options if I understand correctly, but maybe some of that information is also in the MuTect output?

    Any advice is appreciated, thank you.

Sign In or Register to comment.