If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Optimal values of parameters for GATK188.8.131.52 FilterMutectCalls on Tumor-Only Mutect2 VCF
I was hoping someone could help me understand (or at least, how to titrate) values of the parameters in FIlterMutectCalls to maximize my sensitivity without getting too many false positive in the tumor-only case. My intention is that this post will serve as a reference for future users that might be facing this same problem, as I've done an extensive search on the internet and have not been able to find any recent posts / discussions. So, please forgive the extended length of this post.
The parameters that I "think" are relevant:
--maximum-germline-posterior, and maybe
--log-somatic-prior. But, I really could use some help determining if these are the correct parameters or if there are any others I should consider.
Background and attempts to understand param values
Currently, I'm running a command like this:
java -Xmx4g -jar gatk-package-184.108.40.206-14-g95430b1-SNAPSHOT-local.jar FilterMutectCalls \ --variant SDS1-000196-1-H3.vcf \ --contamination-table SDS1-000196-1-H3.contamination.table \ --output SDS1-000196-1-H3.fcn.vcf \ --tumor-lod $TLOD \ --tumor-segmentation SDS1-000196-1-H3.tumor_segments.table \ --max-germline-posterior $MGP
So in my attempt to understand how changing $TLOD and $MGP affect the number of mutations that "PASS", I ran the above with various values of $TLOD and $MGP on a cohort with 43 tumor-normal pairs and my 1 tumor-only sample. We suspect that these samples have a low mutation rate (rare blood disease, no previous WES/WGS studies to get ballpark mutation rate, some samples are pre-cancer while others are full-blow cancer).
The TLOD plot shows number of PASS mutations per pair / sample with varying $TLOD and fixed MGP=0.1 (default). TLOD=5.3 is colored red in this plot. The MGP plot shows number of PASS mutations per pair / sample with varying $MGP and fixed TLOD=5.3 (default). The results are attached below. As you can see from looking at the MGP plot specifically, changing these values affects the number of mutations for all pairs/samples, but it very drastically affects the number of mutations in the tumor-only case (I've marked the sample with a red blip on the x axis of the plot). This sample has markedly more mutations that PASS the default filters for some reason, I have a feeling that they are all germline events.
So I wonder, assuming the default TLOD=5.3 is optimal, is there an optimal MGP for the tumor-only case? Should I even be playing with these parameter values at all? Or maybe should I just resort to a different filtering method (all ears for suggestions)? Should I be playing with the
I hope that this makes sense. Please let me know if I can help to clarify my dilemma/questions. Thank you so much for your time and consideration.