We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
[GATK4 beta] no filter-passing variants in Mutect2 tumor-only runs using default parameters

Hello,
I would like to ask your advice on the tumor only mode of Mutect.
I ran GATK4 beta.3's Mutect on 20 tumor samples using tumor-only mode, and found no variant passing filters. Every variant is filtered out after running FilterMutectCalls tool. It seems that germline risk is estimated very high overall.
Mutect2 was executed using the scripts/mutect2_wdl/mutect2_multi_sample.wdl in the GATK source repository. gnomAD is given for the population af source and default parameters are used.
I'd appreciate it if you would help run tumor-only mode of Mutect.
FYI, 10^P_GERMLINE (log10 posterior probability for alt allele to be germline variants in INFO) of a tumor sample distributes as below. Outliers are not plotted for the sake of simplicity.
Summary(10^P_GERMLINE)
Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. |
---|---|---|---|---|---|
0.04699 | 0.93345 | 0.99919 | 0.94155 | 1.00000 | 1.00000 |
(Additionally, no toolbar button works on this 'ask a question' page I am writing this question. (such as bold, italic, file upload buttons) Is it just me?)
Comments
@dayzcool
Hi,
That is interesting. We don't fully support tumor-only mode right now, but you can play around with the default filters in FilterMutectCalls. I think increasing
--max_germline_posterior
will help with the germline risk filter.How many variants are called before filtering? Are 100% of the variants filtered out because of "germline risk"? Can you post a few example records after filtering?
Thanks,
Sheila
Hi @Sheila,
Thank you for your help! As you see in the plot, germline risk of most variants are 90+% for tumor-only mode. Thus, --max_germline_posterior needs to be set ridiculously high to pass variants (e.g. 0.999). At max_germline_posterior of 0.5, which looks very large (default value is 0.025?), there is still no passing variant from 20 samples; some variants pass germline_risk filter, but not all filters. The number of unfiltered variants in a sample is ~100,000.
These samples are actually paired; Mutect gives about 1000 filter-passing variants per pair if same samples are processed using tumor-normal mode.
Here are a few variants from a tumor-only run:
Issue · Github
by Sheila
@dayzcool
Hi,
It sounds like the germline risk filter is indeed supposed to be very stringent when running in tumor only mode. The AF in the germline resource basically acts as a hard filter. For example, if the AF in the germline resource for a site is 0.2, if the AF of the site in your tumor sample is less than that, it will be filtered out.
With a matched normal sample, the filter is not so stringent. The AF of the germline resource is also taken into account (it may not be as high as in the population).
I hope that helps.
-Sheila
P.S. Are ~1000 variants passing out of ~100,000 even with the tumor-normal matched pairing?
@Sheila, thanks for your comments. Yes, the number of 'PASS' variants range from 100-1000 for the samples I ran Mutect using tumor-normal mode.
@dayzcool
Hi,
Thanks. Let me check with the team if that is appropriate. It seems a little low to me. Are you working with whole genomes or exomes?
-Sheila
@Sheila, these are exome samples. How many more filter-passing variants do you expect to see?
@dayzcool
Hi,
Yes, I was thinking it was a low number of passing variants.
It turns out the default af_of_alleles_not_in_resource is set to 0.001, which is the default if you do not input a germline resource. In your case, you have input the gnomAD resource. So, you would set --af_of_alleles_not_in_resource 1/400,000.
The rationale behind this (from the developer): The af_of_alleles_not_in_resource is set to its default of 0.001, which is the average germline heterozygosity and appropriate for when no germline resource is given. If the germline resource is given, the af_of_alleles_not_in_resource should be set to 1/2 * # of samples in the germline resource. The exonic regions in the gnomAD vcf have > 200,000 samples while the rest of the genome has only ~16,000 samples. Thus af_of_alleles_not_in_resource should be 1/400,000 for wes and 1/32,000 for wgs. It is tricky to set the default value because of the difference in number of genome and exomes samples. So, we just provide a default as having no germline resource.
I hope that makes sense.
-Sheila
@Sheila, thank you for your kind explanation. It does make sense and I would definitely reconfigure --af_of_alleles_not_in_resource.
It seems that Mutect could possibly call 100-200 more variants filtered only by germline_risk filter.
FYI, here is frequency of filters in one Mutect call using tumor-normal mode. (1085 PASS calls and 170 calls only by germline_risk)
https://drive.google.com/open?id=0B-jmIuKNMWYeUnkzUzluR2Q0WGM
@dayzcool
Hi,
I see. Alright, let us know how setting --af_of_alleles_not_in_resource changes things. You may also try playing around with the other filters in FilterMutectCalls. Mutect2 is meant to be very sensitive so it does not miss any potential variation.
-Sheila
I have been having a similar problem, and it's unclear to me how to choose the 'best' value of
--max_germline_posterior
. I'm using WGS samples, and I've made sure to adjust--af_of_alleles_not_in_resource
to reflect that I'm using WGS and gnomad. Without changing the default--max_germline_posterior
, I'm left with around ~20 mutations per sample (in a cohort of 26). But when I bump to--max_germline_posterior = 0.999
, I'm getting ~600K average. How should I go about finding the sweet spot here? How do I know I've found it? Any guidance at all is greatly appreciated. Thank you!Issue · Github
by Sheila
@ncamarda
Hi,
Interesting. I need to check with the developers and get back to you.
-Sheila
@ncamarda
Hi again,
From the developer:
"I think that for tumor-only calling setting af-of-alleles-not-in-resource to 0 is fine. The germline posterior probabilities for alleles not in gnomAD won't be right, but that's not the important thing. Basically this setting amounts to a guarantee not to filter variants that are not in gnomAD, which is sensible.
I am currently testing a code change that uses knowledge of the local allele fraction of germline hets to greatly improve germline filtering. The basic idea is that if the allele fraction is, say, 0.2, and a germline het would have allele fraction 0.5 then the variant is most likely somatic. This change should be out within a few weeks."
I hope this helps.
-Sheila
Hi,
I am having the similar issue where after Mutect2 filtering, none of the variants are passed. This is happening with both tumor only and matched normal data.
I am using following resources for Dog Genome:
1. 68_sample PON
2. Germline resource with 722 individuals and
--af-of-alleles-not-in-resource = 0.000693
Mutect2 command:
I got 0 passed calls in this run. As per the suggestions above, I modified the
--af-of-alleles-not-in-resource=0
in a separate run but still I do not get any pass filter calls.I am not sure to what extent estimate contamination affects this, but I am always getting 0 contamination in most of my samples.
Below is the high-level summary of filters applied with
af-of-alleles-not-in-resource=0
andaf-of-alleles-not-in-resource=0.000693
.Do you have further suggestions? changing
--max_germline_posterior
is recommended withaf-of-alleles-not-in-resource=0
?Do you have further suggestions for this?
Ah, thankfully I'm not the only one with this issue.
I am wondering if it's some bug in the FilterMutectCalls in gatk4 ..?
I am working with mouse whole genomes, and I have called matched tumor-normal, with a germline resource (called in house) and default parameters. My numbers aren't big (up to 10 pairs), but I ran gatk 3 Mutect pipeline with our first batch (of ~5) abut a year ago, and I never had problems with getting variants to PASS.
Now I reprocessed the data with an additional batch of paired samples, and with gatk 4.0.2.1 and 4.0.3.0, and I am not getting anything to PASS. Nothing has changed at least in the samples included from batch 1 that were re-run again ..
Thanks @vsvinti for your comments. I am also bit relieved that its a common issue and Team GATK might find us some good solution. My initial run was with gatk 4.0.0.0 but I see that issue persists in gatk 4.0.3.0.
In both cases, it is non-human data and I am wondering if we need to setup some parameters differently for this? In the meanwhile, I will revisit older version of Mutect to call the SNPs.
Awaiting response/suggestions from Team GATK.
What are your suggestions, @Sheila ?
@vsvinti @sutturka
Hi,
Can you post some example records that are failing? @vsvinti Do you have some sites that passed in GATK3 that are now failing in GATK4? Can you post those?
Also, can you comment on which filters seem to be failing most of the variants?
Thanks,
Sheila
@Sheila
Here is a handful of records for the same matched pair run with the two versions:
I haven't looked at all the annotations, but all variants seem to be assigned germline_risk (no records after grep -v germline_risk). I have run both versions with and without germline resource, problem is the same. When using with germline resource, I generated a germline set of calls from my normals using haplotypecaller, as there isn't a resource for known variants. For gatk3, I ran also ran versions with PON, but regardless of details, i always got variants to PASS.
I must say I don't understand the difference between providing the normal germline as a resource, or as a PON, but that's off topic. It's not making a difference to the PASS problem.
Hi @vsvinti, your gatk4 calls look similar to my gatk-4.0.2.1 calls where tumor-only mode is used, because germline_risk is estimated very high for every variant, and INFO column doesn't have NLOD or N_ART_LOD. IMHO, it may be worthwhile to double check if tumor-normal mode is used for gatk4.
In addition, I'd like to share the GATK doc on PON and germline resource, which I found very helpful:
Somatic calling is NOT simply a difference between two callsets
Differences between GATK3 MuTect2 and GATK4 Mutect2
@dayzcool
Thanks a mil for your input. Now that I look again at my command, you may be right! I give it two bam files, but thought I only need to tell it which one is tumor, and it will figure that other one is normal. Perhaps it's not, and it's doing tumor only, so maybe I need to specify --normal-sample. I should have thought of that before!
Thanks for the very useful links above - there's so much documentation that I didn't come across those particular ones. I haven't seen any docs suggesting generating a germline resource from the normals using haplotype caller, but I thought it must be alright to derive AFs - perhaps not?!
According to the second link
"The tool filters variant sites in the panel of normals (PoN) or the matched normal."
so perhaps there's no benefit of doing both ?!
I have two sets of data.
Tumor only - VCF generation and FilterMutectCalls run with GATK 4.0.0.0
Tumor-Normal pair - VCF generation and FilterMutectCalls run GATK 4.0.3.0
Both data sets are run with
--af-of-alleles-not-in-resource = 0.000693
In both cases, I do not get any PASS calls after
FilterMutectCalls
steps.Below is the summary of filters applied:
In
Tumor only
data, germline_risk was applicable to all the calls followed by panel_of_normals and clustered_events.In a different sample from
Tumor only
data, I triedaf-of-alleles-not-in-resource = 0
(see post above) which reduced the number of germline_risk calls but still no PASS calls.Below is the summary of filters applied:
In
Tumor-Normal pair
data, germline_risk filter was sufficiently low but many calls are filtered through clustered_events and artifact_in_normal.@vsvinti, I am glad you found it helpful! I agree with you that there is no benefit in doing both with same set of normal samples. I am not sure how many normal samples your have. I would probably not try to build a germline resource myself unless there are 1000s of normal samples are available. I am generally happy with PON filter.
@sutturka, thanks for the interesting summary. I am seeing similar picture from my human data. To me, it was helpful to evaluate filters manually and adjust thresholds, especially ones playing a huge role.
Looking at the large number for
artifact_in_normal
, I was curious whether your matched normals could be contaminated by tumor (residual tumor cells?). I had the contamination issue and had to adjust parameters accordingly. FilterMutectCalls provides some guidance. You may already be aware of it though.Hi,
@vsvinti
There may be benefit if the artifact is in the PoN and not in the normal or vice versa. Sometimes these artifacts strike at random and may not be caught in one or the other. Have a look at the Mutect2 hands on tutorial in the Presentations section for more information.
@dayzcool
Thank you for sharing your input!
@sutturka
Those are a lot of calls filtered out by clustered events in the second part. Can you post some example bamout shots of those sites? Also, did you try
af-of-alleles-not-in-resource = 0
with the tumor-normal data? I am wondering if playing around with that will help with the germline risk filter. Also, are you indeed inputting a germline resource file?Thanks,
Sheila
@dayzcool
Thank you for the suggestions. I do see some level of contamination (0.000139) in SampleA but not in sampleB. I will go through the FilterMutectCalls guidance in detail.
@vsvinti
How does your call looks after using the correct command? Do you get PASS calls with GATK4*?
@sheila
I have attached few screen shots with clustered events calls. I do see higher density of calls around these positions. Do you have any additional suggestions for adjusting thresholds for clustered_events or artifact_in_normal?
I have started a run with
af-of-alleles-not-in-resource = 0
and update you when results are available.Yes, I am using the germline resource file with
--germline-resource /VCF/722g.990.SNP.INDEL.chrAll.vcf.gz
and I do not get any error.In the meanwhile, I am running old MuTect2 (GATK/3.8.0) with paired-normal data. It is not yet complete, but as I glimpsed through the VCF file, I do see
PASS
calls with same data. Below are the commands I used for both versions.Below are the calls which are different across GATK 3.8.0 and GATK4*
Although, GATK/3.8.0 run is not complete, so far, I see >2000 PASS calls. When I extract the same positions from GATK/4.0.3.0 filtered VCF, I see >1600 of the calls have
.
in the FILTER column? I wonder if something wrong in assignment of PASS calls in GATK/4.0.3.0?@Sheila
I do not see any PASS calls in tumor-normal paired data with
af-of-alleles-not-in-resource = 0
. Only change I see is fewer calls filtered by germline_risk filter. But anyways, the germline_risk calls were very few with matched normal data.Hi @sutturka
Yes! Once I specify my matched normal, I get between 10-50k variants with PASS filter, depending on what samples I'm comparing. That's lots, but I don't have any resources to add to the filtering .. perhaps that may be a bit lower if I were to use PON.
Thank you @vsvinti for the update.
@Sheila Do you have any further suggestion regarding my data posted above? I am running old MuTect2 (GATK/3.8.0) with other samples and with each one I see the PASS calls but not in the newer version? May be worth sending this data to developers as test case?
@sutturka
Hi,
Sorry for the delay. I think this thread will help with clustered_events filter. For artifact in normal filter, you can try increasing the
--normal-artifact-lod
.Ah, this was an issue that has since been fixed
Sorry for the confusion. The PASS was a . for a little bit, but it is now back to being PASS.
-Sheila
Thanks @Sheila for the suggestions.
So this issue has been fixed in GATK/4.0.4.0? Just wanted to make sure I am using the correct version.
@sutturka
Hi,
Yes, the issue is fixed in 4.0.4.0
You can read more in the release notes.
-Sheila
Hi Sheila,
I wonder if gatk 4.0.8.1 Mutect2 includes a patch related to following reply from a developer, i.e., including germline hets AF for tumor-only somatic calling.
Also, if I set up
af-of-alleles-not-in-resource = 0
, wouldn't it render use of germline-resources optional? With zero cut-off, none of germline resource variants will be flagged (filtered out) from mutect2 tumor-only calls?Thanks,
Samir
@Samir
Hi Samir,
I don't think so.
It is probably best not to set it to 0. But, setting it to a very small value will indeed cause fewer variants to be flagged as germline.
-Sheila
Hi,
At this moment we are comparing the results of Mutect2 on tumor-normal and tumor-only mode.
In tumor-only mode some variants (including 2 driver mutations) are filtered out based on germline_risk. In tumor-normal mode, these variants are labeled as PASS.
I understand that a variant gets the label germline_risk when
10^P_GERMLINE > --max_germline_posterior
(default of --max_germline_posterior = 0.1).However, it is not entirely clear to me which variables all affect the value of P_GERMLINE and how this calculation goes. Can you tell us a bit more about this?
Some relevant settings we use:
Mutect2 4.0.6.0
Germline resource: gnomAD
--af_of_alleles_not_in_resource: 0.000004
Erik
Hi @Erik84
Sorry about the delay in response. I will have to ask the developers about this and get back to you by Monday.
Regards
Bhanu
Hi @Erik84
For Mutect2, all the information about the algorithm is described in the document https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf , but briefly, Mutect2 does something very similar to what HaplotypeCaller does - it uses the population allele frequencies as a prior and goes through the genotypes for each site and determines likelihoods based on whether it's hom ref, hom var, or het. Then it uses the likelihoods and prior to create the posterior probability.
Regards
Bhanu
Hi Bhanu,
Thanks for your reply and for pointing me out to this documentation of mathematical notes. This is where I was looking for
Erik
Hi @Samir
Yes, we did this a few months ago.
This parameter sets the AF that we impute for alleles missing from the germline resource. Setting it to zero means alleles missing from gnomAD won't get filtered but doesn't affect those in the resource.