If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.
Detection of somatic mutations with RNAseq - Mutect2
We are working on a swine model of melanoma, where tumors trigger an efficient immune response, most likely by producing neoantigens, ie proteins carrying somatic mutations and thereby recognized as tumor antigens by the immune system.
We previously used Mutect2 from GATK for variant calling on tumor exome (from another project) and tried to use it again, this time to detect somatic mutation in tumor with RNAseq (16 matched pairs “normal tissue-tumor tissue”) as we don't have the correponding DNAseq. We got the following error :
ERROR MESSAGE: Unsupported CIGAR operator N in read ST-J00115:28:H5HTYBBXX:2:1214:24637:30538 at 1:12383. Perhaps you are trying to use RNA-Seq data? While we are currently actively working to support this data type unfortunately the GATK cannot be used with this data in its current form. You have the option of either filtering out all reads with operator N in their CIGAR string (please add --filter_reads_with_N_cigar to your command line) or assume the risk of processing those reads as they are including the pertinent unsafe flag (please add -U ALLOW_N_CIGAR_READS to your command line). Notice however that if you were to choose the latter, an unspecified subset of the analytical outputs of an unspecified subset of the tools will become unpredictable. Consequently the GATK team might well not be able to provide you with the usual support with any issue regarding any output
Also, we could read in several forums dedicated to variant calling that this tool was not adapted, due notably to splicing and read depth depending on expression levels.
- Shall we try anyway to run a few samples and check the output? We do not want an exhaustive list of variants, just a global idea of the relevance of our approach on neoantigens
- Is it totally hopeless since it would just give us false positive results?
- Is there an alternative way to call variants on these data, integrating matched pairs, and without reference genome for the samples?