If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

RNA-seq and WGS for cancer samples (no paired T-N)

Dear GATK Team,

I want to do SNP calling on cancer RNA-seq data (no matching normals). In some cases I have matching WGS data (for which I also want to do SNP calling) - but not for most cases.
If I understood this correctly I cannot use MuTect2 without matching normals - so do you then recommend GATK or a completely different tool?

I very much appreciate your help.




  • JulsJuls Member ✭✭

    Oh something important I should mention - I am not interested in somatic mutations in particular - I am actually interested in all heterozygous SNPs there are in given sample. Therefore, GATK should be the right caller for me, shouldn't it?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    edited November 2016

    Hi there,

    First, I just want to clarify that GATK is a toolkit that includes many tools, several of which are variant callers, and MuTect2 is one of them. Really here you need to choose between HaplotypeCaller and MuTect2.

    Based on what you're describing I would recommend HaplotypeCaller to capture germline events in your samples, but I do so with reservations -- because it's not clear to me what you want to achieve ultimately. If you tell me more I may be able to give you more definitive advice.

  • JulsJuls Member ✭✭

    Thank you very much for your answer - Yes I meant HaplotypeCaller not MuTect2.
    I want to do a couple of things - among others one sample differential allelic expression analysis (one sample because I have no paired T-N, just T and N unpaired) and SNP density analysis. For the former, I need heterozygous SNPs - all heterozygous SNPs - somatic and germline - all heterozygous SNPs which could lead to information on ADE. For the latter all SNPs.
    Thanks for your help.

    Issue · Github
    by Sheila

    Issue Number
    Last Updated
    Closed By
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I see, thanks for clarifying.

    First you should pre-process your samples according to our Best Practices recommendations -- be sure to apply the RNAseq-specific procedure for the RNAseq samples, otherwise you will have technical problems.

    You can use HaplotypeCaller to call germline variants on the normal samples, and use MuTect2 to call somatic variants on tumor samples in tumor-only (="artifact detection") mode, which is unsupported but can be done. Then you'll need to evaluate how the callsets overlap to rule out likely germline events from the somatic callsets.

    Keep in mind that with this experimental design (lots of unmatched samples), you will have many confounding factors to deal with in your analysis. We won't be able to give you any guidance on interpreting results since this diverges so much from our best practices. Good luck!

  • JulsJuls Member ✭✭

    Hi Geraldine,

    Thank you for your answer! I have been thinking about this. I am still a bit confused about this.
    If I do differential allelic expression analysis on a sample I need at least one a heterozygous SNP in a transcript - it doesn't matter whether it's a somatic or a germline one. Just any. So why would I need to use MyTec2 for the cancer samples then? Could I not just use HaplotypeCaller on all samples (cancer and normal)?

    Thanks again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It depends what you're looking for. You only told us you want to look at allele expression in samples, but you didn't say what type of variants you wish to evaluate in those samples. Do you want to evaluate allelic expression of germline variants in your tumor sample? Or of somatic variants? Both can be meaningful in theory.

    To identify somatic mutations, you should use MuTect; to identify germline variants, you should use HaplotypeCaller. This is because they apply very different statistical methods for evaluating the presence of variation in the read data.

    Then you provide whichever callset you produced to the ASE tool.

  • JulsJuls Member ✭✭

    Thank you very much for your quick answer. So if I want both, I would need to use both. Just for my understanding the HaplotypeCaller will call germline variants as well as frequent somatic mutations in the cancer sample but not all (low frequent) due to the different statistical methods?

    Thank you!

  • SheilaSheila Broad InstituteMember, Broadie admin


    Hi Julia,

    Yes, that is correct.


  • JulsJuls Member ✭✭

    Thank you for your help!

Sign In or Register to comment.