Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Optimal values of parameters for GATK4.0.3.0 FilterMutectCalls on Tumor-Only Mutect2 VCF

ncamardancamarda DFCIMember, Broadie

Intro

I was hoping someone could help me understand (or at least, how to titrate) values of the parameters in FIlterMutectCalls to maximize my sensitivity without getting too many false positive in the tumor-only case. My intention is that this post will serve as a reference for future users that might be facing this same problem, as I've done an extensive search on the internet and have not been able to find any recent posts / discussions. So, please forgive the extended length of this post.

Current Understanding

The parameters that I "think" are relevant: --tumor-lod and --maximum-germline-posterior, and maybe --log-somatic-prior. But, I really could use some help determining if these are the correct parameters or if there are any others I should consider.

Background and attempts to understand param values

Currently, I'm running a command like this:

java -Xmx4g -jar gatk-package-4.0.3.0-14-g95430b1-SNAPSHOT-local.jar FilterMutectCalls \
--variant SDS1-000196-1-H3.vcf \
--contamination-table SDS1-000196-1-H3.contamination.table \
--output SDS1-000196-1-H3.fcn.vcf \
--tumor-lod $TLOD \
--tumor-segmentation SDS1-000196-1-H3.tumor_segments.table \
--max-germline-posterior $MGP

So in my attempt to understand how changing $TLOD and $MGP affect the number of mutations that "PASS", I ran the above with various values of $TLOD and $MGP on a cohort with 43 tumor-normal pairs and my 1 tumor-only sample. We suspect that these samples have a low mutation rate (rare blood disease, no previous WES/WGS studies to get ballpark mutation rate, some samples are pre-cancer while others are full-blow cancer).

The TLOD plot shows number of PASS mutations per pair / sample with varying $TLOD and fixed MGP=0.1 (default). TLOD=5.3 is colored red in this plot. The MGP plot shows number of PASS mutations per pair / sample with varying $MGP and fixed TLOD=5.3 (default). The results are attached below. As you can see from looking at the MGP plot specifically, changing these values affects the number of mutations for all pairs/samples, but it very drastically affects the number of mutations in the tumor-only case (I've marked the sample with a red blip on the x axis of the plot). This sample has markedly more mutations that PASS the default filters for some reason, I have a feeling that they are all germline events.

Specific Questions

So I wonder, assuming the default TLOD=5.3 is optimal, is there an optimal MGP for the tumor-only case? Should I even be playing with these parameter values at all? Or maybe should I just resort to a different filtering method (all ears for suggestions)? Should I be playing with the --log-somatic-prior parameter?

I hope that this makes sense. Please let me know if I can help to clarify my dilemma/questions. Thank you so much for your time and consideration.

Comments

  • SheilaSheila Broad InstituteMember, Broadie admin

    @ncamarda
    Hi,

    Sorry for the delay. I need to think/check with the team and get back to you.

    -Sheila

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    Hi Nick,

    A TLOD of 5.3 is fairly optimal. Higher is defensible as it will definitely decrease false positives. A significant rate of germline false positives is inevitable in tumor-only calling, especially with our defaults, which are set to emphasize sensitivity. To have a better sense of what to do for your data I would like to see a histogram of TLODs and AFs for a representative sample.

    Also, it couldn't hurt to set log_somatic_prior to -7.0 or -8.0 -- -7.0 corresponds to an expectation of ~(3 x 10^9) x (10^-8) = 30 somatic mutations per genome, for example.

    -David

  • biobenkjbiobenkj Grand Rapids, MIMember

    Just as an added caveat, in the latest version of GATK (v4.0.4.0), Mutect2 no longer has the --log-somatic-prior option.

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    True -- that argument has been moved to FilterMutectCalls.

Sign In or Register to comment.