To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Detection of somatic mutations with RNAseq - Mutect2

mcharlesmcharles Jouy en JosasMember

We are working on a swine model of melanoma, where tumors trigger an efficient immune response, most likely by producing neoantigens, ie proteins carrying somatic mutations and thereby recognized as tumor antigens by the immune system.

We previously used Mutect2 from GATK for variant calling on tumor exome (from another project) and tried to use it again, this time to detect somatic mutation in tumor with RNAseq (16 matched pairs “normal tissue-tumor tissue”) as we don't have the correponding DNAseq. We got the following error :

ERROR MESSAGE: Unsupported CIGAR operator N in read ST-J00115:28:H5HTYBBXX:2:1214:24637:30538 at 1:12383. Perhaps you are trying to use RNA-Seq data? While we are currently actively working to support this data type unfortunately the GATK cannot be used with this data in its current form. You have the option of either filtering out all reads with operator N in their CIGAR string (please add --filter_reads_with_N_cigar to your command line) or assume the risk of processing those reads as they are including the pertinent unsafe flag (please add -U ALLOW_N_CIGAR_READS to your command line). Notice however that if you were to choose the latter, an unspecified subset of the analytical outputs of an unspecified subset of the tools will become unpredictable. Consequently the GATK team might well not be able to provide you with the usual support with any issue regarding any output
ERROR ------------------------------------------------------------------------------------------

Also, we could read in several forums dedicated to variant calling that this tool was not adapted, due notably to splicing and read depth depending on expression levels.

  • Shall we try anyway to run a few samples and check the output? We do not want an exhaustive list of variants, just a global idea of the relevance of our approach on neoantigens
  • Is it totally hopeless since it would just give us false positive results?
  • Is there an alternative way to call variants on these data, integrating matched pairs, and without reference genome for the samples?

Best Answer


Sign In or Register to comment.