Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK Workflow for Cancer

gaiusjaugustusgaiusjaugustus Arizona, USAMember
edited July 2015 in Ask the GATK team

I am new to Bioinformatics, and would like some advice on changes to the GATK workflow for cancer. I was told that the cancer workflow is different, and see that several different tools are available.

I have Exome data from tumor and normal. I have aligned them, and have BAM files for each sample. I am interested in identifying somatic variants.

The current workflow as I understand it is:
-(Non-GATK) Picard Mark Duplicates or Samtools roundup
-Indel Realignment (Realigner TargetCreator + Indel Realigner)
-Base Quality Score Reacalibration (Base Recalibrator + PrintReads)
-Annotation using Oncotator (?)
-MuTect (identify somatic mutations)

My questions are:
1. Is the above workflow reasonable/correct for what I'm trying to do?
2. Is there any difference running samples one pair at a time, or running them all together? (I have 57 pairs. Should I do 57 runs of normal-tumor pairs, or 1 run of all 57 pairs?)

Thank you,


  • gaiusjaugustusgaiusjaugustus Arizona, USAMember

    I've gotten this answer before (just now found it):

    "We (GATK docs team) are working on some docs for the somatic variant calling use case. In a nutshell, you'll need to do an additional pre-processing step called co-cleaning where you perform indel realignment on the tumor and normal in a pair together, use ContEst to estimate cross-sample contamination, use MuTect to call variants (not HC, which is not able to call low-AF variants like MuTect), do some manual filtering and processing to eliminate artifacts (VQSR is not appropriate for somatic calls) and finally annotate with Oncotator. " -vdauwera

    But unsure of what I can use to do the co-cleaning. And unsure where the other steps go in my workflow. Does this mean I don't need to use Picard/Samtools, Indel Realignment, etc? Do I ONLY need to use the workflow here?

    -ContEst (estimate cross-sample contamination)
    -MuTect (Call variants)
    -Eliminate artifacts (How?)
    -Oncotator (Annotate)

    And my #2 question above still stands.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    As an update (since it was pointed out to us that this post comes up when searching for GATK + cancer), we have some new workflows coming out with GATK4 enabling somatic analysis of SNPs and indels (Mutect2) and CNVs (GATK4-CNV). We'll post more details in the Best Practices section in the near future.

Sign In or Register to comment.