Complete this survey about your research needs and be entered to win an Amazon gift card or FireCloud credit.
Read more about it here!
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.6 is out. See the GATK4 beta page for download and details.

[GATK4 beta] no filter-passing variants in Mutect2 tumor-only runs using default parameters

dayzcooldayzcool Member
edited August 13 in Ask the GATK team

Hello,

I would like to ask your advice on the tumor only mode of Mutect.
I ran GATK4 beta.3's Mutect on 20 tumor samples using tumor-only mode, and found no variant passing filters. Every variant is filtered out after running FilterMutectCalls tool. It seems that germline risk is estimated very high overall.
Mutect2 was executed using the scripts/mutect2_wdl/mutect2_multi_sample.wdl in the GATK source repository. gnomAD is given for the population af source and default parameters are used.
I'd appreciate it if you would help run tumor-only mode of Mutect.

FYI, 10^P_GERMLINE (log10 posterior probability for alt allele to be germline variants in INFO) of a tumor sample distributes as below. Outliers are not plotted for the sake of simplicity.

Summary(10^P_GERMLINE)

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.04699 0.93345 0.99919 0.94155 1.00000 1.00000

P_GERMLINE plot

(Additionally, no toolbar button works on this 'ask a question' page I am writing this question. (such as bold, italic, file upload buttons) Is it just me?)

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    That is interesting. We don't fully support tumor-only mode right now, but you can play around with the default filters in FilterMutectCalls. I think increasing --max_germline_posterior will help with the germline risk filter.

    How many variants are called before filtering? Are 100% of the variants filtered out because of "germline risk"? Can you post a few example records after filtering?

    Thanks,
    Sheila

  • Hi @Sheila,

    Thank you for your help! As you see in the plot, germline risk of most variants are 90+% for tumor-only mode. Thus, --max_germline_posterior needs to be set ridiculously high to pass variants (e.g. 0.999). At max_germline_posterior of 0.5, which looks very large (default value is 0.025?), there is still no passing variant from 20 samples; some variants pass germline_risk filter, but not all filters. The number of unfiltered variants in a sample is ~100,000.

    These samples are actually paired; Mutect gives about 1000 filter-passing variants per pair if same samples are processed using tumor-normal mode.

    Here are a few variants from a tumor-only run:

    1       53504306        .       G       A       .       germline_risk;t_lod     DP=40;ECNT=1;POP_AF=1.000e-03;P_GERMLINE=-1.273e-02;TLOD=4.23   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:REF_F1R2:REF_F2R1:SA_MAP_AF:SA_POST_PROB   0/1:36,2:0.078:1:1:0.500:41,39:0,0:484,414:60,60:20,20:false:true:0.500:0.078:47.28:100.00:22:14:0.00,0.051,0.053:9.447e-03,0.025,0.965
    1       53504798        .       T       A       .       base_quality;germline_risk      DP=159;ECNT=3;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;TLOD=11.43 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:PGT:PID:REF_F1R2:REF_F2R1:SA_MAP_AF:SA_POST_PROB   0/1:152,5:0.038:5:0:1.00:41,12:0,0:408,414:60,60:29,44:false:false:.:.:49.33:100.00:0|1:53504798_T_A:66:86:0.030,0.00,0.032:3.606e-03,0.014,0.982
    1       53504799        .       ACTGTTCCC       A       .       germline_risk   DP=166;ECNT=3;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;TLOD=11.33 GT:AD:AF:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:PGT:PID:SA_MAP_AF:SA_POST_PROB       0/1:158,5:0.040:41,37:0,0:410,414:60,60:30,43:false:false:0|1:53504798_T_A:0.030,0.00,0.031:3.610e-03,0.013,0.983
    1       53504807        .       C       CAAAAAAAA       .       base_quality;germline_risk      DP=151;ECNT=3;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;RPA=5,13;RU=A;STR;TLOD=11.98       GT:AD:AF:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:PGT:PID:SA_MAP_AF:SA_POST_PROB       0/1:143,6:0.047:41,12:0,0:414,414:60,60:28,34:false:false:0|1:53504798_T_A:0.030,0.00,0.034:4.192e-03,0.013,0.982
    1       53547800        .       AGAG    A       .       germline_risk;panel_of_normals;str_contraction  DP=261;ECNT=1;IN_PON;POP_AF=4.431e-04;P_GERMLINE=-4.896e-04;RPA=5,4;RU=GAG;STR;TLOD=19.36       GT:AD:AF:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB       0/1:224,8:0.087:41,41:0,0:420,418:60,60:35,40:false:false:0.030,0.010,0.034:2.315e-03,0.017,0.980
    

    Issue · Github
    by Sheila

    Issue Number
    2458
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    It sounds like the germline risk filter is indeed supposed to be very stringent when running in tumor only mode. The AF in the germline resource basically acts as a hard filter. For example, if the AF in the germline resource for a site is 0.2, if the AF of the site in your tumor sample is less than that, it will be filtered out.

    With a matched normal sample, the filter is not so stringent. The AF of the germline resource is also taken into account (it may not be as high as in the population).

    I hope that helps.

    -Sheila

    P.S. Are ~1000 variants passing out of ~100,000 even with the tumor-normal matched pairing?

  • @Sheila, thanks for your comments. Yes, the number of 'PASS' variants range from 100-1000 for the samples I ran Mutect using tumor-normal mode.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    Thanks. Let me check with the team if that is appropriate. It seems a little low to me. Are you working with whole genomes or exomes?

    -Sheila

  • dayzcooldayzcool Member
    edited August 23

    @Sheila, these are exome samples. How many more filter-passing variants do you expect to see?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    Yes, I was thinking it was a low number of passing variants.

    It turns out the default af_of_alleles_not_in_resource is set to 0.001, which is the default if you do not input a germline resource. In your case, you have input the gnomAD resource. So, you would set --af_of_alleles_not_in_resource 1/400,000.

    The rationale behind this (from the developer): The af_of_alleles_not_in_resource is set to its default of 0.001, which is the average germline heterozygosity and appropriate for when no germline resource is given. If the germline resource is given, the af_of_alleles_not_in_resource should be set to 1/2 * # of samples in the germline resource. The exonic regions in the gnomAD vcf have > 200,000 samples while the rest of the genome has only ~16,000 samples. Thus af_of_alleles_not_in_resource should be 1/400,000 for wes and 1/32,000 for wgs. It is tricky to set the default value because of the difference in number of genome and exomes samples. So, we just provide a default as having no germline resource.

    I hope that makes sense.

    -Sheila

  • @Sheila, thank you for your kind explanation. It does make sense and I would definitely reconfigure --af_of_alleles_not_in_resource.
    It seems that Mutect could possibly call 100-200 more variants filtered only by germline_risk filter.

    FYI, here is frequency of filters in one Mutect call using tumor-normal mode. (1085 PASS calls and 170 calls only by germline_risk)
    https://drive.google.com/open?id=0B-jmIuKNMWYeUnkzUzluR2Q0WGM

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    I see. Alright, let us know how setting --af_of_alleles_not_in_resource changes things. You may also try playing around with the other filters in FilterMutectCalls. Mutect2 is meant to be very sensitive so it does not miss any potential variation.

    -Sheila

Sign In or Register to comment.