We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

tumor cohort versus normal cohort variation comparison

Dear GATK group,
We aim to compare snp/indel differences between a tumor cohort and a normal cohort. Going through your documents, we see GATK tumor/normal pair branch, which seems to be based on one tumor sample and one corresponding normal sample. We also see cohort approach which include joint variation call step in the haplotype caller step. I am wondering how to do a tumor/normal pair approach basing on tumor cohort and normal cohort using joint variation call option across all tumor samples and across all normal samples (also realign tumor and normal samples together).
Our current approach is to merge all tumor bam files as one sample, and to merge all normal bam files as one sample, then use tumor/normal pair flow. But the cohort joint variation call across all tumor samples and all normal samples is not implemented in the merged bam files, is it right? In another word, we have to separate every tumor samples in order to perform joint variation call in the haplotypecaller step. Is it right?
Looking forward to hearing from your suggestions.


  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    Hi Duan,

    You should be using MuTect2 for somatic variation detection.

    What exactly is in your normal cohort and tumor cohort? How many samples are in each and do they come from the same tissue?

    If you do not have matched tumor/normal pairs, you can use a Panel of Normals made from your normal cohort. Have a look at this article which tells you more information.


  • duanduan MITMember

    Dear Sheila,
    Thank you so much for your help. Our current tumor cohort contains about 20 samples, and our normal cohort current contains 15 samples. The sample size will grow larger in the future. I do use MuTect2. But we also want to use GATK joint variation calling approach to compare the results.

    Issue · Github
    by Sheila

    Issue Number
    Last Updated
    Closed By
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @duan,

    Given every tumor is unique, much like a snowflake, we recommend comparing matched tumor and normal pairs. Our somatic workflows aim to find somatic mutations and to remove the backdrop of germline variation. Let me assume this is also your goal. To reiterate, germline variation may contribute to risk of developing cancer but we assume there are mutations that drive tumorigenesis and these are in the tumor and not the normal tissue.

    Merging all the normals into a single sample and merging all the tumors into a single sample obliterates your ability to contrast a matched pair. Instead, you should keep sample identities intact (defined by the read group SM tag) even if you would jointly process the samples. In addition, you want to be sure to jointly process the matched pairs.

    I should mention that MuTect2 is in beta and thus still undergoing development. So it's great that you are comparing different workflows that include MuTect2. We'd be interested in hearing more about your approach and the results of the comparisons.

Sign In or Register to comment.