Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

[GATK4 beta] no filter-passing variants in Mutect2 tumor-only runs using default parameters

dayzcooldayzcool Member
edited August 2017 in Ask the GATK team

Hello,

I would like to ask your advice on the tumor only mode of Mutect.
I ran GATK4 beta.3's Mutect on 20 tumor samples using tumor-only mode, and found no variant passing filters. Every variant is filtered out after running FilterMutectCalls tool. It seems that germline risk is estimated very high overall.
Mutect2 was executed using the scripts/mutect2_wdl/mutect2_multi_sample.wdl in the GATK source repository. gnomAD is given for the population af source and default parameters are used.
I'd appreciate it if you would help run tumor-only mode of Mutect.

FYI, 10^P_GERMLINE (log10 posterior probability for alt allele to be germline variants in INFO) of a tumor sample distributes as below. Outliers are not plotted for the sake of simplicity.

Summary(10^P_GERMLINE)

Min. 1st Qu. Median Mean 3rd Qu. Max.
0.04699 0.93345 0.99919 0.94155 1.00000 1.00000

P_GERMLINE plot

(Additionally, no toolbar button works on this 'ask a question' page I am writing this question. (such as bold, italic, file upload buttons) Is it just me?)

Comments

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    That is interesting. We don't fully support tumor-only mode right now, but you can play around with the default filters in FilterMutectCalls. I think increasing --max_germline_posterior will help with the germline risk filter.

    How many variants are called before filtering? Are 100% of the variants filtered out because of "germline risk"? Can you post a few example records after filtering?

    Thanks,
    Sheila

  • Hi @Sheila,

    Thank you for your help! As you see in the plot, germline risk of most variants are 90+% for tumor-only mode. Thus, --max_germline_posterior needs to be set ridiculously high to pass variants (e.g. 0.999). At max_germline_posterior of 0.5, which looks very large (default value is 0.025?), there is still no passing variant from 20 samples; some variants pass germline_risk filter, but not all filters. The number of unfiltered variants in a sample is ~100,000.

    These samples are actually paired; Mutect gives about 1000 filter-passing variants per pair if same samples are processed using tumor-normal mode.

    Here are a few variants from a tumor-only run:

    1       53504306        .       G       A       .       germline_risk;t_lod     DP=40;ECNT=1;POP_AF=1.000e-03;P_GERMLINE=-1.273e-02;TLOD=4.23   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:REF_F1R2:REF_F2R1:SA_MAP_AF:SA_POST_PROB   0/1:36,2:0.078:1:1:0.500:41,39:0,0:484,414:60,60:20,20:false:true:0.500:0.078:47.28:100.00:22:14:0.00,0.051,0.053:9.447e-03,0.025,0.965
    1       53504798        .       T       A       .       base_quality;germline_risk      DP=159;ECNT=3;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;TLOD=11.43 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:OBF:OBP:OBQ:OBQRC:PGT:PID:REF_F1R2:REF_F2R1:SA_MAP_AF:SA_POST_PROB   0/1:152,5:0.038:5:0:1.00:41,12:0,0:408,414:60,60:29,44:false:false:.:.:49.33:100.00:0|1:53504798_T_A:66:86:0.030,0.00,0.032:3.606e-03,0.014,0.982
    1       53504799        .       ACTGTTCCC       A       .       germline_risk   DP=166;ECNT=3;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;TLOD=11.33 GT:AD:AF:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:PGT:PID:SA_MAP_AF:SA_POST_PROB       0/1:158,5:0.040:41,37:0,0:410,414:60,60:30,43:false:false:0|1:53504798_T_A:0.030,0.00,0.031:3.610e-03,0.013,0.983
    1       53504807        .       C       CAAAAAAAA       .       base_quality;germline_risk      DP=151;ECNT=3;POP_AF=1.000e-03;P_GERMLINE=-2.169e-04;RPA=5,13;RU=A;STR;TLOD=11.98       GT:AD:AF:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:PGT:PID:SA_MAP_AF:SA_POST_PROB       0/1:143,6:0.047:41,12:0,0:414,414:60,60:28,34:false:false:0|1:53504798_T_A:0.030,0.00,0.034:4.192e-03,0.013,0.982
    1       53547800        .       AGAG    A       .       germline_risk;panel_of_normals;str_contraction  DP=261;ECNT=1;IN_PON;POP_AF=4.431e-04;P_GERMLINE=-4.896e-04;RPA=5,4;RU=GAG;STR;TLOD=19.36       GT:AD:AF:MBQ:MCL:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB       0/1:224,8:0.087:41,41:0,0:420,418:60,60:35,40:false:false:0.030,0.010,0.034:2.315e-03,0.017,0.980
    

    Issue · Github
    by Sheila

    Issue Number
    2458
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    It sounds like the germline risk filter is indeed supposed to be very stringent when running in tumor only mode. The AF in the germline resource basically acts as a hard filter. For example, if the AF in the germline resource for a site is 0.2, if the AF of the site in your tumor sample is less than that, it will be filtered out.

    With a matched normal sample, the filter is not so stringent. The AF of the germline resource is also taken into account (it may not be as high as in the population).

    I hope that helps.

    -Sheila

    P.S. Are ~1000 variants passing out of ~100,000 even with the tumor-normal matched pairing?

  • @Sheila, thanks for your comments. Yes, the number of 'PASS' variants range from 100-1000 for the samples I ran Mutect using tumor-normal mode.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    Thanks. Let me check with the team if that is appropriate. It seems a little low to me. Are you working with whole genomes or exomes?

    -Sheila

  • dayzcooldayzcool Member
    edited August 2017

    @Sheila, these are exome samples. How many more filter-passing variants do you expect to see?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    Yes, I was thinking it was a low number of passing variants.

    It turns out the default af_of_alleles_not_in_resource is set to 0.001, which is the default if you do not input a germline resource. In your case, you have input the gnomAD resource. So, you would set --af_of_alleles_not_in_resource 1/400,000.

    The rationale behind this (from the developer): The af_of_alleles_not_in_resource is set to its default of 0.001, which is the average germline heterozygosity and appropriate for when no germline resource is given. If the germline resource is given, the af_of_alleles_not_in_resource should be set to 1/2 * # of samples in the germline resource. The exonic regions in the gnomAD vcf have > 200,000 samples while the rest of the genome has only ~16,000 samples. Thus af_of_alleles_not_in_resource should be 1/400,000 for wes and 1/32,000 for wgs. It is tricky to set the default value because of the difference in number of genome and exomes samples. So, we just provide a default as having no germline resource.

    I hope that makes sense.

    -Sheila

  • @Sheila, thank you for your kind explanation. It does make sense and I would definitely reconfigure --af_of_alleles_not_in_resource.
    It seems that Mutect could possibly call 100-200 more variants filtered only by germline_risk filter.

    FYI, here is frequency of filters in one Mutect call using tumor-normal mode. (1085 PASS calls and 170 calls only by germline_risk)
    https://drive.google.com/open?id=0B-jmIuKNMWYeUnkzUzluR2Q0WGM

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @dayzcool
    Hi,

    I see. Alright, let us know how setting --af_of_alleles_not_in_resource changes things. You may also try playing around with the other filters in FilterMutectCalls. Mutect2 is meant to be very sensitive so it does not miss any potential variation.

    -Sheila

  • ncamardancamarda DFCIMember, Broadie

    I have been having a similar problem, and it's unclear to me how to choose the 'best' value of --max_germline_posterior. I'm using WGS samples, and I've made sure to adjust --af_of_alleles_not_in_resource to reflect that I'm using WGS and gnomad. Without changing the default --max_germline_posterior, I'm left with around ~20 mutations per sample (in a cohort of 26). But when I bump to --max_germline_posterior = 0.999, I'm getting ~600K average. How should I go about finding the sweet spot here? How do I know I've found it? Any guidance at all is greatly appreciated. Thank you!

    Issue · Github
    by Sheila

    Issue Number
    2972
    State
    closed
    Last Updated
    Assignee
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @ncamarda
    Hi,

    Interesting. I need to check with the developers and get back to you.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @ncamarda
    Hi again,

    From the developer:

    "I think that for tumor-only calling setting af-of-alleles-not-in-resource to 0 is fine. The germline posterior probabilities for alleles not in gnomAD won't be right, but that's not the important thing. Basically this setting amounts to a guarantee not to filter variants that are not in gnomAD, which is sensible.

    I am currently testing a code change that uses knowledge of the local allele fraction of germline hets to greatly improve germline filtering. The basic idea is that if the allele fraction is, say, 0.2, and a germline het would have allele fraction 0.5 then the variant is most likely somatic. This change should be out within a few weeks."

    I hope this helps.

    -Sheila

  • Hi,

    I am having the similar issue where after Mutect2 filtering, none of the variants are passed. This is happening with both tumor only and matched normal data.

    I am using following resources for Dog Genome:
    1. 68_sample PON
    2. Germline resource with 722 individuals and --af-of-alleles-not-in-resource = 0.000693

    Mutect2 command:

    gatk  Mutect2  \
    --reference Reference/genome_ref.fasta  \
    --input L1_dedup.recal_pass1.bam \
    --tumor-sample L1 \
    --panel-of-normals PON/68sample_PON.vcf.gz \
    --germline-resource  /VCF/722g.990.SNP.INDEL.chrAll.vcf.gz  \
    --af-of-alleles-not-in-resource 0.000693  \
    --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter \
    --intervals  chromosome.list \
    --output L1_somatic_m2.vcf.gz \
    --bam-output L1_somatic_m2.bam
    
    
    gatk FilterMutectCalls \
    -V L1_somatic_m2.vcf.gz \
    --contamination-table L1_contamination.table \
    -O L1_somatic_filter1.vcf.gz
    

    I got 0 passed calls in this run. As per the suggestions above, I modified the --af-of-alleles-not-in-resource=0 in a separate run but still I do not get any pass filter calls.

    I am not sure to what extent estimate contamination affects this, but I am always getting 0 contamination in most of my samples.

    level    contamination  error
    whole_bam    0.0      0.0
    

    Below is the high-level summary of filters applied with af-of-alleles-not-in-resource=0 and af-of-alleles-not-in-resource=0.000693.

    Filter af-of-alleles-not-in-resource 0 af-of-alleles-not-in-resource 0.000693
    Total Calls 1,854,904 1,854,904
    clustered_events 1,076,192 1,076,192
    germline_risk 1,667,184 1,854,904
    panel_of_normals 1,388,873 1,388,873
    mapping_quality 58,431 58,431

    Do you have further suggestions? changing --max_germline_posterior is recommended with af-of-alleles-not-in-resource=0?

    I have separate data with matched normal that is suffering with the same issue. With af-of-alleles-not-in-resource=0.000693 and matched noraml, none of the calls are passed. I see that maximum calls are filtered with clustered_events and artifact_in_normal

    Do you have further suggestions for this?

  • vsvintivsvinti Member

    Ah, thankfully I'm not the only one with this issue.
    I am wondering if it's some bug in the FilterMutectCalls in gatk4 ..?

    I am working with mouse whole genomes, and I have called matched tumor-normal, with a germline resource (called in house) and default parameters. My numbers aren't big (up to 10 pairs), but I ran gatk 3 Mutect pipeline with our first batch (of ~5) abut a year ago, and I never had problems with getting variants to PASS.

    Now I reprocessed the data with an additional batch of paired samples, and with gatk 4.0.2.1 and 4.0.3.0, and I am not getting anything to PASS. Nothing has changed at least in the samples included from batch 1 that were re-run again ..

  • Thanks @vsvinti for your comments. I am also bit relieved that its a common issue and Team GATK might find us some good solution. My initial run was with gatk 4.0.0.0 but I see that issue persists in gatk 4.0.3.0.

    In both cases, it is non-human data and I am wondering if we need to setup some parameters differently for this? In the meanwhile, I will revisit older version of Mutect to call the SNPs.

    Awaiting response/suggestions from Team GATK.

  • vsvintivsvinti Member

    What are your suggestions, @Sheila ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator
    edited April 18

    @vsvinti @sutturka
    Hi,

    Can you post some example records that are failing? @vsvinti Do you have some sites that passed in GATK3 that are now failing in GATK4? Can you post those?

    Also, can you comment on which filters seem to be failing most of the variants?

    Thanks,
    Sheila

  • vsvintivsvinti Member

    @Sheila

    Here is a handful of records for the same matched pair run with the two versions:

    # gatk3
    1   37313503    .   T   C   .   PASS    ECNT=1;HCNT=4;MAX_ED=.;MIN_ED=.;NLOD=7.08;TLOD=8.55 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:28,9:0.243:7:2:0.778:870,134:12:16  0/0:25,0:0.00:0:0:.:736,0:14:11
    1   38693916    .   C   T   .   PASS    ECNT=1;HCNT=2;MAX_ED=.;MIN_ED=.;NLOD=6.56;TLOD=6.56 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:26,3:0.103:1:2:0.667:858,99:11:15   0/0:22,0:0.00:0:0:.:685,0:3:19
    1   41742878    .   A   C   .   PASS    ECNT=1;HCNT=9;MAX_ED=.;MIN_ED=.;NLOD=2.63;TLOD=9.16 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:28,1:0.097:0:1:1.00:840,30:15:13    0/0:32,1:0.057:1:0:0.00:997,19:10:22
    1   48613890    .   G   T   .   PASS    ECNT=1;HCNT=10;MAX_ED=.;MIN_ED=.;NLOD=17.87;TLOD=6.40   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:32,3:0.086:2:1:0.667:997,94:16:16   0/0:60,0:0.00:0:0:.:1883,0:32:28
    1   51717580    .   C   T   .   PASS    ECNT=1;HCNT=2;MAX_ED=.;MIN_ED=.;NLOD=8.92;TLOD=6.70 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:33,3:0.086:2:1:0.333:879,97:15:18   0/0:52,1:0.020:0:1:1.00:1457,34:28:24
    
    # gatk4
    1   37313503    .   T   C   .   base_quality;germline_risk  DP=78;ECNT=1;POP_AF=1.000e-03;P_GERMLINE=-2.195e-04;TLOD=7.92   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB ./. 0/1:31,10:0.245:14,7:17,3:8:400,404:60:20:0.00,0.242,0.244:0.503,3.391e-03,0.494
    1   38693916    .   C   T   .   germline_risk;t_lod DP=71;ECNT=1;POP_AF=1.000e-03;P_GERMLINE=-7.176e-02;TLOD=3.45   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB ./. 0/1:24,2:0.202:10,0:14,2:33:427,483:60:20:0.081,0.00,0.077:0.019,0.014,0.967
    1   41742878    .   A   C   .   clustered_events;germline_risk  DP=70;ECNT=4;POP_AF=1.000e-03;P_GERMLINE=-2.179e-04;TLOD=8.34   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB ./. 0/1:28,3:0.097:15,2:13,1:30:459,437:60:9:0|1:41742863_G_GCCCCC:0.00,0.101,0.097:0.549,2.149e-03,0.449
    1   48613890    .   G   T   .   germline_risk   DP=106;ECNT=2;POP_AF=1.000e-03;P_GERMLINE=-1.152e-03;TLOD=5.37  GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB ./. 0/1:34,3:0.108:16,2:18,1:35:443,441:60:61:0.00,0.081,0.081:0.032,7.977e-03,0.960
    1   51717580    .   C   T   .   germline_risk   DP=77;ECNT=1;POP_AF=1.000e-03;P_GERMLINE=-6.225e-04;TLOD=5.73   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB ./. 0/1:33,3:0.092:14,2:19,1:32:440,430:60:20:0.071,0.051,0.083:8.337e-03,0.025,0.967
    

    I haven't looked at all the annotations, but all variants seem to be assigned germline_risk (no records after grep -v germline_risk). I have run both versions with and without germline resource, problem is the same. When using with germline resource, I generated a germline set of calls from my normals using haplotypecaller, as there isn't a resource for known variants. For gatk3, I ran also ran versions with PON, but regardless of details, i always got variants to PASS.

    I must say I don't understand the difference between providing the normal germline as a resource, or as a PON, but that's off topic. It's not making a difference to the PASS problem.

  • Hi @vsvinti, your gatk4 calls look similar to my gatk-4.0.2.1 calls where tumor-only mode is used, because germline_risk is estimated very high for every variant, and INFO column doesn't have NLOD or N_ART_LOD. IMHO, it may be worthwhile to double check if tumor-normal mode is used for gatk4.

    In addition, I'd like to share the GATK doc on PON and germline resource, which I found very helpful:
    Somatic calling is NOT simply a difference between two callsets
    Differences between GATK3 MuTect2 and GATK4 Mutect2

  • vsvintivsvinti Member
    edited April 19

    @dayzcool

    Thanks a mil for your input. Now that I look again at my command, you may be right! I give it two bam files, but thought I only need to tell it which one is tumor, and it will figure that other one is normal. Perhaps it's not, and it's doing tumor only, so maybe I need to specify --normal-sample. I should have thought of that before!

    $gatk --java-options "-Xmx8g" Mutect2 \
    -R $reference_seq \
    -I $bam_dir/$tumor_bam \
    -I $bam_dir/$normal_bam \
    --tumor-sample $tumor \
    -O $vcf_dir/$batch.$chr.vcf.gz \
    -L $chr \
    --germline-resource $germ_resource
    

    Thanks for the very useful links above - there's so much documentation that I didn't come across those particular ones. I haven't seen any docs suggesting generating a germline resource from the normals using haplotype caller, but I thought it must be alright to derive AFs - perhaps not?!

    According to the second link
    "The tool filters variant sites in the panel of normals (PoN) or the matched normal."
    so perhaps there's no benefit of doing both ?!

  • I have two sets of data.

    Tumor only - VCF generation and FilterMutectCalls run with GATK 4.0.0.0

    Tumor-Normal pair - VCF generation and FilterMutectCalls run GATK 4.0.3.0

    Both data sets are run with --af-of-alleles-not-in-resource = 0.000693

    In both cases, I do not get any PASS calls after FilterMutectCalls steps.

    ## First 10 records from two samples from  **Tumor only**  data
    
    # Sample1
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample1
    chr1    309     .       T       G       .       germline_risk;t_lod     DP=3;ECNT=2;POP_AF=0.095;P_GERMLINE=-7.980e-04;TLOD=3.41        GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:0,2:0.799:0,0:0,2:20:0,990:30:20:0.00,0.990,1.00:0.029,0.025,0.946
    chr1    313     .       G       T       .       germline_risk   DP=2;ECNT=2;POP_AF=0.113;P_GERMLINE=-1.740e-06;TLOD=7.45        GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:0,2:1.000:0,0:0,2:36:0,990:30:24:0.00,0.990,1.00:0.029,0.025,0.946
    chr1    458     .       C       T       .       clustered_events;germline_risk;mapping_quality  DP=14;ECNT=11;POP_AF=0.251;P_GERMLINE=-5.977e-07;TLOD=19.56     GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:6,8:0.571:2,5:4,3:38:732,952:24:25:0.525,0.566,0.571:0.024,0.028,0.948
    chr1    478     .       A       G       .       clustered_events;germline_risk;mapping_quality  DP=29;ECNT=11;POP_AF=0.611;P_GERMLINE=-7.055e-08;TLOD=28.30     GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:12,13:0.513:8,7:4,6:36:903,731:25:17:0.00,0.495,0.500:0.029,0.025,0.946
    chr1    544     .       T       A       .       clustered_events;germline_risk;mapping_quality;t_lod    DP=63;ECNT=11;POP_AF=6.930e-04;P_GERMLINE=-3.641e-02;TLOD=3.92  GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:60,2:0.039:29,2:31,0:37:729,685:25:28:0.00,0.030,0.032:6.348e-03,0.025,0.968
    
    
    # Sample2
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Sample2
    chr1    289     .       A       G       .       germline_risk;mapping_quality;t_lod     DP=19;ECNT=1;POP_AF=0.013;P_GERMLINE=-3.867e-04;TLOD=4.65       GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:16,3:0.159:12,0:4,3:34:470,1034:23:22:0.00,0.162,0.158:0.021,0.022,0.958
    chr1    453     .       C       A       .       clustered_events;germline_risk;mapping_quality;t_lod    DP=34;ECNT=16;POP_AF=0.048;P_GERMLINE=-5.880e-04;TLOD=3.87      GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:30,3:0.103:15,1:15,2:37:553,745:24:11:0.00,0.091,0.091:0.017,0.015,0.967
    chr1    458     .       C       T       .       clustered_events;germline_risk  DP=38;ECNT=16;POP_AF=0.251;P_GERMLINE=-5.977e-07;TLOD=14.28     GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:31,6:0.171:15,4:16,2:37:563,972:33:17:0.00,0.162,0.162:0.027,0.015,0.958
    chr1    478     .       A       G,T     .       clustered_events;germline_risk;mapping_quality;multiallelic     DP=54;ECNT=16;POP_AF=0.611,0.143;P_GERMLINE=-7.055e-08,-1.295e-06;TLOD=30.42,7.72       GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1/2:30,14,4:0.290,0.106:15,6,4:15,8,0:37,37:567,727,857:25,31:15,19:0.00,0.293,0.292:0.045,0.015,0.941
    chr1    550     .       C       A       .       clustered_events;germline_risk;mapping_quality  DP=147;ECNT=16;POP_AF=6.930e-04;P_GERMLINE=-3.130e-04;TLOD=24.15        GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB     0/1:137,9:0.065:64,3:73,6:37:710,976:25:52:0|1:550_C_A:0.00,0.061,0.062:0.010,0.015,0.974
    
    

    Below is the summary of filters applied:

    # Filter Sample1 Sample2
    Total Mutation calls 2,324,736 2,313,850
    germline_risk 2,324,736 2,313,850
    panel_of_normals 1,672,299 1,686,141
    clustered_events 1,405,025 1,365,637
    multiallelic 108,791 123,992
    mapping_quality 108,313 89,788
    t_lod 88,012 76,080
    str_contraction 69,172 72,481
    read_position 58,717 58,139
    base_quality 43,321 44,185
    strand_artifact 11,257 9,434
    fragment_length 6,348 5,846
    artifact_in_normal 0 0
    contamination 0 0
    duplicate_evidence 0 0

    In Tumor only data, germline_risk was applicable to all the calls followed by panel_of_normals and clustered_events.

    In a different sample from Tumor only data, I tried af-of-alleles-not-in-resource = 0 (see post above) which reduced the number of germline_risk calls but still no PASS calls.

    ## First 10 records from two samples from  **Tumor-Normal pair**  data
    
    # SampleA
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SampleA      SampleA_N
    chr1    281     .       G       T       .       mapping_quality DP=46;ECNT=1;NLOD=5.72;N_ART_LOD=-1.301e+00;POP_AF=0.018;P_CONTAM=4.565e-07;P_GERMLINE=-4.365e+00;TLOD=5.82     GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:23,4:0.148:12,2:11,2:37:360,370:24:21:0.00,0.152,0.148:0.037,0.012,0.951    0/0:19,0:5.252e-04:6,0:13,0:0:382,0:0:0
    chr1    784     .       C       T       .       clustered_events        DP=45;ECNT=6;NLOD=4.82;N_ART_LOD=-1.230e+00;POP_AF=0.096;P_CONTAM=8.890e-03;P_GERMLINE=-4.034e+00;TLOD=8.46     GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:26,3:0.103:14,1:12,2:38:439,467:31:29:0.00,0.101,0.103:0.014,0.025,0.960    0/0:16,0:2.965e-05:6,0:10,0:0:437,0:0:0
    chr1    789     .       G       C       .       clustered_events;t_lod  DP=49;ECNT=6;NLOD=5.12;N_ART_LOD=-1.255e+00;POP_AF=6.930e-04;P_CONTAM=7.524e-05;P_GERMLINE=-7.286e+00;TLOD=4.98 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB     0/1:29,3:0.090:14,2:15,1:36:443,420:40:33:0|1:789_G_C:0.00,0.091,0.094:0.014,0.025,0.961        0/0:17,0:2.976e-05:6,0:11,0:0:434,0:0:0:0|1:789_G_C
    chr1    796     .       C       A       .       clustered_events;t_lod  DP=58;ECNT=6;NLOD=5.12;N_ART_LOD=-1.000e+00;POP_AF=3.766e-03;P_CONTAM=4.125e-04;P_GERMLINE=-6.211e+00;TLOD=4.97 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB     0/1:29,3:0.171:14,2:15,1:37:438,420:40:40:0|1:789_G_C:0.00,0.091,0.094:0.014,0.025,0.961        0/0:17,0:0.028:6,0:11,0:0:434,0:0:0:0|1:789_G_C
    chr1    866     .       C       G       .       artifact_in_normal;clustered_events     DP=249;ECNT=6;NLOD=12.35;N_ART_LOD=6.67;POP_AF=1.486e-03;P_CONTAM=5.625e-07;P_GERMLINE=-4.651e+01;TLOD=11.43    GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB     0/1:152,5:0.035:69,4:83,1:38:389,367:40:11:0|1:866_C_G:0.00,0.030,0.032:5.294e-03,0.026,0.969   0/0:87,4:0.044:39,2:48,2:38:395,437:33:39:0|1:866_C_G
    
    
    # SampleB
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SampleB      SampleB_N
    chr1    784     .       C       T       .       artifact_in_normal;clustered_events     DP=102;ECNT=6;NLOD=2.88;N_ART_LOD=3.61;POP_AF=0.096;P_CONTAM=0.00;P_GERMLINE=-1.210e+01;TLOD=5.56       GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:63,3:0.053:30,1:33,2:38:459,463:33:22:0.00,0.051,0.045:7.784e-03,0.025,0.967        0/0:33,2:0.058:13,0:20,2:37:465,430:35:19
    chr1    796     .       C       T       .       artifact_in_normal;clustered_events;mapping_quality;t_lod       DP=114;ECNT=6;NLOD=5.01;N_ART_LOD=1.74;POP_AF=0.048;P_CONTAM=0.00;P_GERMLINE=-1.916e+01;TLOD=3.64       GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:75,2:0.026:36,1:39,1:37:455,446:29:20:0.00,0.030,0.026:5.265e-03,0.026,0.969        0/0:35,2:0.054:13,0:22,2:34:463,426:27:1
    chr1    797     .       A       C       .       artifact_in_normal;clustered_events     DP=115;ECNT=6;NLOD=2.98;N_ART_LOD=3.75;POP_AF=4.596e-03;P_CONTAM=0.00;P_GERMLINE=-1.414e+01;TLOD=8.80   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:72,5:0.073:36,1:36,4:36:455,461:40:21:0.00,0.061,0.065:9.735e-03,0.025,0.965        0/0:35,2:0.054:12,1:23,1:36:460,507:40:30
    chr1    822     .       T       C       .       artifact_in_normal;clustered_events;mapping_quality;t_lod       DP=176;ECNT=6;NLOD=13.26;N_ART_LOD=0.952;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-4.094e+01;TLOD=3.10 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:115,2:0.017:52,2:63,0:37:439,400:24:33:0.00,0.020,0.017:3.617e-03,0.026,0.971       0/0:58,1:0.017:23,1:35,0:35:448,374:24:2
    chr1    902     .       T       A       .       artifact_in_normal;clustered_events;mapping_quality     DP=231;ECNT=6;NLOD=12.97;N_ART_LOD=0.605;POP_AF=0.042;P_CONTAM=0.00;P_GERMLINE=-2.853e+01;TLOD=15.24    GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:119,13:0.155:61,2:58,11:37:424,383:25:60:0.00,0.101,0.098:0.013,0.025,0.962 0/0:62,3:0.112:26,2:36,1:38:424,346:24:66
    chr1    951     .       T       G       .       clustered_events        DP=124;ECNT=6;NLOD=10.83;N_ART_LOD=-1.362e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-1.727e+01;TLOD=9.49    GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB     0/1:57,6:0.180:25,5:32,1:38:381,385:40:39:0.00,0.091,0.095:0.013,0.025,0.962    0/0:36,0:0.091:17,0:19,0:0:377,0:0:0
    
    

    Below is the summary of filters applied:

    # Filter SampleA SampleB
    Total Mutations 316,063 362,049
    clustered_events 204,962 257,140
    artifact_in_normal 191,278 223,794
    t_lod 97,168 110,921
    mapping_quality 71,120 89,178
    panel_of_normals 66,700 79,148
    contamination 56,028 0
    base_quality 20,330 18,031
    read_position 13,584 16,023
    str_contraction 11,233 9,895
    multiallelic 5,221 6,161
    germline_risk 5,086 6,502
    strand_artifact 4,232 3,215
    fragment_length 1,634 1,651
    duplicate_evidence 0 0

    In Tumor-Normal pair data, germline_risk filter was sufficiently low but many calls are filtered through clustered_events and artifact_in_normal.

  • @vsvinti, I am glad you found it helpful! I agree with you that there is no benefit in doing both with same set of normal samples. I am not sure how many normal samples your have. I would probably not try to build a germline resource myself unless there are 1000s of normal samples are available. I am generally happy with PON filter.

  • @sutturka, thanks for the interesting summary. I am seeing similar picture from my human data. To me, it was helpful to evaluate filters manually and adjust thresholds, especially ones playing a huge role.
    Looking at the large number for artifact_in_normal, I was curious whether your matched normals could be contaminated by tumor (residual tumor cells?). I had the contamination issue and had to adjust parameters accordingly. FilterMutectCalls provides some guidance. You may already be aware of it though.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    Hi,

    @vsvinti

    According to the second link "The tool filters variant sites in the panel of normals (PoN) or the matched normal." so perhaps there's no benefit of doing both ?!

    There may be benefit if the artifact is in the PoN and not in the normal or vice versa. Sometimes these artifacts strike at random and may not be caught in one or the other. Have a look at the Mutect2 hands on tutorial in the Presentations section for more information.

    @dayzcool
    Thank you for sharing your input!

    @sutturka
    Those are a lot of calls filtered out by clustered events in the second part. Can you post some example bamout shots of those sites? Also, did you try af-of-alleles-not-in-resource = 0 with the tumor-normal data? I am wondering if playing around with that will help with the germline risk filter. Also, are you indeed inputting a germline resource file?

    Thanks,
    Sheila

  • @dayzcool
    Thank you for the suggestions. I do see some level of contamination (0.000139) in SampleA but not in sampleB. I will go through the FilterMutectCalls guidance in detail.

    @vsvinti
    How does your call looks after using the correct command? Do you get PASS calls with GATK4*?

    @sheila
    I have attached few screen shots with clustered events calls. I do see higher density of calls around these positions. Do you have any additional suggestions for adjusting thresholds for clustered_events or artifact_in_normal?

    I have started a run with af-of-alleles-not-in-resource = 0 and update you when results are available.

    Yes, I am using the germline resource file with --germline-resource /VCF/722g.990.SNP.INDEL.chrAll.vcf.gz and I do not get any error.

    In the meanwhile, I am running old MuTect2 (GATK/3.8.0) with paired-normal data. It is not yet complete, but as I glimpsed through the VCF file, I do see PASS calls with same data. Below are the commands I used for both versions.

    ## GATK/3.8.0
    GenomeAnalysisTK -T MuTect2  \
    -R          /Reference/genome_ref.fasta  \
    -I:tumor    /input/Sample1_tumor.bam \
    -I:normal   /input/Sample1_normal.bam \
    --dbsnp  /VCF/722g.990.SNP.INDEL.chrAll.vcf.gz  \
    --normal_panel  /DogSD_PON/DogSD_68sample_PON.vcf.gz  \
    -L  chromosome.list  \
    -o Sample1.vcf
    
    ## GATK/4.0.3.0
    
    gatk Mutect2   \
    --reference /Reference/genome_ref.fasta  \
    --input  /input/Sample1_tumor.bam \
    --input  /input/Sample1_normal.bam \
    --normal  Sample1_normal \
    --tumor Sample1_tumor \
    --panel-of-normals DogSD_68sample_PON.vcf.gz  \
    --germline-resource /VCF/722g.990.SNP.INDEL.chrAll.vcf.gz  \
    --af-of-alleles-not-in-resource 0.000693  \
    --disable-read-filter MateOnSameContigOrNoMappedMateReadFilter  \
    --intervals chromosome.list  \
    --output  Sample1_somatic_m2.vcf.gz  \
    --bam-output Sample1_somatic_m2.bam  
    
    
    gatk FilterMutectCalls   \
    -V Sample1_somatic_m2.vcf.gz   \
    --contamination-table Sample1_contamination.table  \
    -O  Sample1_somatic_filter1.vcf.gz  \
    

    Below are the calls which are different across GATK 3.8.0 and GATK4*

    #  GATK/3.8.0
    chr1    431038  .   T   G   .   PASS    ECNT=1;HCNT=3;MAX_ED=.;MIN_ED=.;NLOD=17.32;TLOD=6.58    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:89,10:0.103:10:0:1.00:3002,153:31:58    0/0:59,0:0.00:0:0:.:1995,0:27:32
    chr1    528974  .   T   G   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=15.86;TLOD=8.20    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:107,10:0.080:10:0:1.00:3600,234:48:59   0/0:57,1:0.018:1:0:1.00:2040,12:28:29
    chr1    579056  .   A   T   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=17.76;TLOD=7.32    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:100,5:0.049:1:4:0.800:3726,182:50:50    0/0:59,0:0.00:0:0:.:2168,0:29:30 
    chr1    941818  .   T   G   .   PASS    ECNT=1;HCNT=8;MAX_ED=.;MIN_ED=.;NLOD=14.21;TLOD=6.83    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:79,9:0.095:9:0:1.00:2655,165:37:42  0/0:52,1:0.020:0:1:0.00:1762,12:22:30
    chr1    2063471 .   T   C   .   PASS    ECNT=1;HCNT=2;MAX_ED=.;MIN_ED=.;NLOD=15.35;TLOD=78.29   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:74,26:0.258:18:8:0.692:2652,940:41:33   0/0:51,0:0.00:0:0:.:1814,0:25:26
    chr1    2080608 .   A   G   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=15.65;TLOD=72.06   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:79,27:0.260:10:17:0.630:2873,851:38:41  0/0:53,0:0.00:0:0:.:1943,0:35:18
    chr1    2118618 .   T   A   .   PASS    ECNT=1;HCNT=142;MAX_ED=.;MIN_ED=.;NLOD=10.63;TLOD=7.51  GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:110,14:0.105:10:4:0.714:3585,155:45:65  0/0:48,1:0.042:0:1:0.00:1629,8:25:23
    chr1    2216598 .   T   G   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=16.72;TLOD=9.39    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:111,11:0.093:10:1:0.909:3694,235:52:59  0/0:57,0:0.00:0:0:.:1933,0:34:23
    chr1    2724602 .   T   G   .   PASS    ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=18.73;TLOD=14.33  GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:115,15:0.110:15:0:1.00:3702,318:60:55   0/0:66,1:0.015:1:0:1.00:2281,13:37:29
    chr1    2843695 .   A   C   .   PASS    ECNT=1;HCNT=2;MAX_ED=.;MIN_ED=.;NLOD=15.00;TLOD=6.60    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:136,8:0.058:2:6:0.750:4291,175:70:66    0/0:51,0:0.00:0:0:.:1740,0:29:22
    chr1    3326229 .   T   G   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=7.78;TLOD=6.49 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:48,7:0.115:6:1:0.857:1418,150:23:25 0/0:26,0:0.00:0:0:.:895,0:9:17
    chr1    3438155 .   T   G   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=16.25;TLOD=7.61    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:107,10:0.089:0:10:0.00:3628,197:55:52   0/0:57,0:0.00:0:0:.:2024,0:36:21
    chr1    3863236 .   C   T   .   PASS    ECNT=2;HCNT=4;MAX_ED=1;MIN_ED=1;NLOD=11.44;TLOD=8.24    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1  0/1:63,3:0.047:1:2:0.667:0|1:3863236_C_T:2305,116:29:34 0/0:41,0:0.00:0:0:.:0|1:3863236_C_T:1512,0:22:19
    chr1    3863237 .   A   G   .   PASS    ECNT=2;HCNT=4;MAX_ED=1;MIN_ED=1;NLOD=11.44;TLOD=8.24    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1  0/1:63,3:0.047:1:2:0.667:0|1:3863236_C_T:2308,116:29:34 0/0:41,0:0.00:0:0:.:0|1:3863236_C_T:1485,0:22:19
    chr1    3929376 .   A   C   .   PASS    ECNT=1;HCNT=20;MAX_ED=.;MIN_ED=.;NLOD=13.94;TLOD=6.90   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:97,13:0.097:2:11:0.846:3102,276:52:45   0/0:53,1:0.020:1:0:0.00:1598,12:28:25
    chr1    3999946 .   G   A   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=18.06;TLOD=98.90   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:93,33:0.260:18:15:0.545:3456,1162:38:55 0/0:62,0:0.00:0:0:.:2260,0:34:28
    chr1    4015001 .   G   A   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=14.75;TLOD=87.34   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:108,29:0.221:13:16:0.448:3936,1043:56:52    0/0:53,0:0.00:0:0:.:1929,0:32:21
    chr1    4364129 .   A   C   .   PASS    ECNT=1;HCNT=6;MAX_ED=.;MIN_ED=.;NLOD=9.25;TLOD=9.20 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:60,5:0.078:0:5:1.00:1962,131:41:19  0/0:31,0:0.00:0:0:.:989,0:20:11
    chr1    5492054 .   C   T   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=18.65;TLOD=111.29  GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:71,36:0.320:20:16:0.444:2321,1352:36:35 0/0:63,0:0.00:0:0:.:1970,0:35:28
    chr1    5577843 .   G   A   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=14.05;TLOD=12.49   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:103,6:0.057:3:3:0.500:3692,199:51:52    0/0:47,0:0.00:0:0:.:1645,0:23:24
    chr1    5582896 .   T   G   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=17.06;TLOD=78.13   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:72,26:0.260:15:11:0.577:2491,936:37:35  0/0:58,0:0.00:0:0:.:2011,0:26:32
    chr1    5623753 .   A   C   .   PASS    ECNT=1;HCNT=16;MAX_ED=.;MIN_ED=.;NLOD=15.05;TLOD=7.46   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:121,8:0.063:8:0:0.00:4032,183:51:70 0/0:51,0:0.00:0:0:.:1812,0:24:27
    chr1    5646316 .   C   A   .   PASS    ECNT=1;HCNT=56;MAX_ED=.;MIN_ED=.;NLOD=2.96;TLOD=7.33    GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:34,5:0.128:2:3:0.600:1115,117:18:16 0/0:10,0:0.00:0:0:.:317,0:4:6
    chr1    5664386 .   T   G   .   PASS    ECNT=1;HCNT=6;MAX_ED=.;MIN_ED=.;NLOD=19.43;TLOD=109.98  GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:72,35:0.327:19:16:0.543:2625,1234:36:36 0/0:69,0:0.00:0:0:.:2479,0:35:34
    chr1    5795127 .   C   A   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=13.85;TLOD=27.63   GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:84,10:0.109:6:4:0.400:3021,366:42:42    0/0:46,0:0.00:0:0:.:1645,0:23:23
    chr1    5941151 .   A   C   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=15.00;TLOD=118.43  GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:81,37:0.324:19:18:0.486:2932,1352:39:42 0/0:53,0:0.00:0:0:.:1915,0:24:29
    chr1    6120717 .   AT  A   .   PASS    ECNT=1;HCNT=1;MAX_ED=.;MIN_ED=.;NLOD=11.29;RPA=8,7;RU=T;STR;TLOD=46.61  GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:QSS:REF_F1R2:REF_F2R1  0/1:49,30:0.364:17:13:.:1731,1082:16:33 0/0:38,0:0.00:0:0:.:1367,0:21:17
    
    # GATK/4.0.3.0
    chr1    431038  .   T   G   .   base_quality    DP=159;ECNT=1;NLOD=17.32;N_ART_LOD=-1.776e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-2.913e+01;TLOD=5.63    GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:86,10:0.114:30,10:56,0:19:337,364:60:20:0.00,0.101,0.104:0.824,5.052e-04,0.176  0/0:58,0:0.022:27,0:31,0:0:368,0:0:0
    chr1    528974  .   T   G   .   .   DP=172;ECNT=1;NLOD=15.86;N_ART_LOD=-1.726e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-3.150e+01;TLOD=10.21   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:101,10:0.102:45,10:56,0:21:361,334:60:20:0.00,0.091,0.090:0.924,1.766e-04,0.076 0/0:55,1:0.041:27,1:28,0:12:352,355:60:20
    chr1    579056  .   A   T   .   mapping_quality DP=164;ECNT=1;NLOD=17.45;N_ART_LOD=-1.777e+00;POP_AF=6.930e-04;P_CONTAM=2.134e-08;P_GERMLINE=-3.691e+01;TLOD=6.02   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:99,5:0.048:49,1:50,4:36:351,331:25:12:0.051,0.00,0.048:2.203e-03,0.147,0.851    0/0:58,0:0.017:28,0:30,0:0:344,0:0:0
    chr1    941818  .   T   G   .   base_quality    DP=138;ECNT=1;NLOD=14.21;N_ART_LOD=-1.678e+00;POP_AF=6.930e-04;P_CONTAM=2.618e-18;P_GERMLINE=-2.459e+01;TLOD=5.93   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:75,8:0.110:36,8:39,0:14:349,389:60:20:0.00,0.101,0.096:0.859,3.974e-04,0.141    0/0:50,1:0.042:21,0:29,1:12:368,372:60:20
    chr1    2063471 .   T   C   .   .   DP=155;ECNT=1;NLOD=15.35;N_ART_LOD=-1.727e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-1.725e+01;TLOD=80.79   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:73,26:0.267:40,17:33,9:37:360,357:60:26:0.253,0.192,0.263:5.520e-03,0.109,0.886 0/0:51,0:0.028:25,0:26,0:0:349,0:0:0
    chr1    2080608 .   A   G   .   .   DP=163;ECNT=1;NLOD=15.65;N_ART_LOD=-1.741e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-1.795e+01;TLOD=71.08   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:77,27:0.265:36,10:41,17:29:341,343:60:20:0.222,0.253,0.260:0.024,9.716e-03,0.966    0/0:52,0:0.044:34,0:18,0:0:357,0:0:0
    chr1    2118618 .   T   A   .   base_quality;strand_artifact    DP=167;ECNT=1;NLOD=12.23;N_ART_LOD=-1.301e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-2.775e+01;TLOD=6.41    GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:106,12:0.108:44,8:62,4:10:336,365:60:21:0.00,0.101,0.102:0.996,9.193e-06,4.483e-03  0/0:46,1:0.031:24,1:22,0:0:350,403:60:0
    chr1    2216598 .   T   G   .   base_quality    DP=179;ECNT=1;NLOD=16.72;N_ART_LOD=-1.758e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-3.313e+01;TLOD=8.40    GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:107,11:0.111:51,10:56,1:17:358,340:60:20:0.00,0.091,0.093:0.957,9.543e-05,0.043 0/0:56,0:0.014:34,0:22,0:0:367,0:0:0
    chr1    2724602 .   T   G   .   strand_artifact DP=199;ECNT=1;NLOD=18.73;N_ART_LOD=-1.797e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-3.398e+01;TLOD=13.36   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:113,15:0.132:59,15:54,0:27:366,337:60:20:0.00,0.121,0.117:0.997,6.813e-06,3.193e-03 0/0:65,1:0.032:36,1:29,0:13:373,378:60:20
    chr1    2843695 .   A   C   .   .   DP=194;ECNT=1;NLOD=15.00;N_ART_LOD=-1.711e+00;POP_AF=6.930e-04;P_CONTAM=3.313e-16;P_GERMLINE=-4.052e+01;TLOD=5.55   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:132,8:0.076:67,2:65,6:20:352,356:60:34:0.061,0.00,0.057:1.786e-03,0.241,0.758   0/0:50,0:0.012:28,0:22,0:0:375,0:0:0
    chr1    3326229 .   T   G   .   .   DP=81;ECNT=1;NLOD=7.78;N_ART_LOD=-1.437e+00;POP_AF=6.930e-04;P_CONTAM=1.675e-13;P_GERMLINE=-1.213e+01;TLOD=5.55 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:46,6:0.145:21,5:25,1:23:347,317:60:20:0.00,0.111,0.115:0.016,0.022,0.962    0/0:26,0:0.022:9,0:17,0:0:362,0:0:0
    chr1    3438155 .   T   G   .   base_quality    DP=167;ECNT=1;NLOD=16.25;N_ART_LOD=-1.740e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-3.218e+01;TLOD=6.68    GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:102,10:0.096:53,0:49,10:13:341,367:60:20:0.091,0.00,0.089:1.210e-03,0.570,0.429 0/0:54,0:1.903e-04:34,0:20,0:0:362,0:0:0
    chr1    3863236 .   C   T   .   mapping_quality DP=96;ECNT=2;NLOD=9.03;N_ART_LOD=-1.491e+00;POP_AF=2.878e-03;P_CONTAM=4.849e-03;P_GERMLINE=-1.890e+01;TLOD=7.09 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:60,3:0.068:28,1:32,2:39:350,332:29:20:0|1:3863236_C_T:0.00,0.051,0.048:0.041,4.190e-03,0.955    0/0:30,0:4.612e-06:16,0:14,0:0:338,0:0:0:0|1:3863236_C_T
    chr1    3863237 .   A   G   .   mapping_quality DP=96;ECNT=2;NLOD=9.03;N_ART_LOD=-1.491e+00;POP_AF=3.597e-03;P_CONTAM=6.066e-03;P_GERMLINE=-1.880e+01;TLOD=7.09 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:PGT:PID:SA_MAP_AF:SA_POST_PROB 0/1:60,3:0.068:28,1:32,2:39:350,332:29:20:0|1:3863236_C_T:0.00,0.051,0.048:0.041,4.190e-03,0.955    0/0:30,0:4.612e-06:16,0:14,0:0:338,0:0:0:0|1:3863236_C_T
    chr1    3929376 .   A   C   .   base_quality    DP=160;ECNT=1;NLOD=13.94;N_ART_LOD=-1.671e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-2.769e+01;TLOD=5.93    GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:94,10:0.119:50,1:44,9:17:356,315:60:27:0.101,0.00,0.096:2.294e-04,0.908,0.092   0/0:50,1:0.044:26,1:24,0:12:374,343:60:14
    chr1    3999946 .   G   A   .   .   DP=187;ECNT=1;NLOD=18.06;N_ART_LOD=-1.791e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-2.132e+01;TLOD=98.05   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:91,32:0.264:37,17:54,15:37:355,360:60:20:0.232,0.253,0.260:0.025,8.659e-03,0.966    0/0:60,0:0.016:33,0:27,0:0:373,0:0:0
    chr1    4015001 .   G   A   .   .   DP=187;ECNT=1;NLOD=15.05;N_ART_LOD=-1.722e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-2.126e+01;TLOD=86.31   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:102,29:0.225:53,13:49,16:37:343,360:60:20:0.192,0.212,0.221:0.016,0.010,0.974   0/0:50,0:0.037:29,0:21,0:0:352,0:0:0
    chr1    4364129 .   A   C   .   .   DP=151;ECNT=1;NLOD=12.85;N_ART_LOD=-1.646e+00;POP_AF=6.930e-04;P_CONTAM=1.896e-08;P_GERMLINE=-3.150e+01;TLOD=6.92   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:97,5:0.075:60,0:37,5:27:349,329:60:22:0.051,0.00,0.049:2.020e-03,0.192,0.806    0/0:43,0:0.016:22,0:21,0:0:367,0:0:0
    chr1    5492054 .   C   T   .   .   DP=170;ECNT=1;NLOD=18.65;N_ART_LOD=-1.808e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-1.859e+01;TLOD=110.23  GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:71,33:0.319:37,17:34,16:38:337,358:60:20:0.283,0.303,0.317:0.030,9.742e-03,0.960    0/0:62,0:0.023:35,0:27,0:0:352,0:0:0
    chr1    5577843 .   G   A   .   .   DP=155;ECNT=1;NLOD=14.05;N_ART_LOD=-1.680e+00;POP_AF=6.930e-04;P_CONTAM=2.134e-08;P_GERMLINE=-3.339e+01;TLOD=8.37   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:99,5:0.065:50,2:49,3:37:348,399:60:20:0.040,0.040,0.048:8.340e-03,6.514e-03,0.985   0/0:47,0:4.382e-03:23,0:24,0:0:360,0:0:0
    chr1    5582896 .   T   G   .   .   DP=158;ECNT=1;NLOD=17.06;N_ART_LOD=-1.769e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-1.890e+01;TLOD=77.18   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:71,25:0.270:37,14:34,11:37:340,323:60:20:0.242,0.232,0.260:0.011,0.020,0.968    0/0:57,0:0.020:25,0:32,0:0:367,0:0:0
    chr1    5623753 .   A   C   .   base_quality    DP=187;ECNT=1;NLOD=15.05;N_ART_LOD=-1.718e+00;POP_AF=6.930e-04;P_CONTAM=1.570e-19;P_GERMLINE=-3.640e+01;TLOD=6.43   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:120,9:0.091:52,9:68,0:13:345,302:53:50:0.00,0.071,0.070:0.574,9.729e-04,0.425   0/0:50,0:0.028:24,0:26,0:0:344,0:0:0
    chr1    5646316 .   C   A   .   base_quality;read_position  DP=57;ECNT=2;NLOD=3.26;N_ART_LOD=-1.092e+00;POP_AF=6.930e-04;P_CONTAM=3.504e-14;P_GERMLINE=-5.220e+00;TLOD=8.73 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:36,6:0.173:20,3:16,3:19:327,342:60:0:0.00,0.141,0.143:0.069,7.672e-03,0.923 0/0:11,0:0.050:4,0:7,0:0:381,0:0:0
    chr1    5664386 .   T   G   .   .   DP=171;ECNT=1;NLOD=19.43;N_ART_LOD=-1.821e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-1.905e+01;TLOD=109.05  GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:70,34:0.330:35,18:35,16:36:334,348:60:33:0.313,0.263,0.327:6.519e-03,0.087,0.907    0/0:65,0:0.012:34,0:31,0:0:345,0:0:0
    chr1    5795127 .   C   A   .   .   DP=140;ECNT=1;NLOD=14.15;N_ART_LOD=-1.681e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-2.470e+01;TLOD=26.59   GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:81,10:0.118:40,6:41,4:37:330,334:60:20:0.091,0.101,0.110:0.011,0.011,0.979  0/0:47,0:1.124e-04:23,0:24,0:0:365,0:0:0
    chr1    5941151 .   A   C   .   .   DP=171;ECNT=1;NLOD=15.00;N_ART_LOD=-1.725e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-1.489e+01;TLOD=117.16  GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:75,36:0.333:36,19:39,17:37:332,348:60:20:0.303,0.303,0.324:0.023,0.011,0.966    0/0:50,0:0.047:23,0:27,0:0:373,0:0:0
    chr1    6120717 .   AT  A   .   .   DP=129;ECNT=1;NLOD=11.42;N_ART_LOD=-1.609e+00;POP_AF=6.930e-04;P_CONTAM=0.00;P_GERMLINE=-9.518e+00;RPA=8,7;RU=T;STR;TLOD=63.03  GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:49,28:0.380:16,16:33,12:36:353,354:60:33:0.343,0.333,0.364:0.013,0.026,0.961    0/0:38,0:0.049:21,0:17,0:0:344,0:0:0
    
    

    Although, GATK/3.8.0 run is not complete, so far, I see >2000 PASS calls. When I extract the same positions from GATK/4.0.3.0 filtered VCF, I see >1600 of the calls have . in the FILTER column? I wonder if something wrong in assignment of PASS calls in GATK/4.0.3.0?

  • @Sheila

    Also, did you try af-of-alleles-not-in-resource = 0 with the tumor-normal data?

    I do not see any PASS calls in tumor-normal paired data with af-of-alleles-not-in-resource = 0. Only change I see is fewer calls filtered by germline_risk filter. But anyways, the germline_risk calls were very few with matched normal data.

    #Filter SampleA SampleB SampleA SampleB
    af-of-alleles-not-in-resource = 0 af-of-alleles-not-in-resource = 0 af-of-alleles-not-in-resource = 0.000693 af-of-alleles-not-in-resource = 0.000693
    Total Mutations 316,063 362,049 316,063 362,049
    germline_risk 4,443 5,598 5,086 6,502
  • vsvintivsvinti Member

    Hi @sutturka

    Yes! Once I specify my matched normal, I get between 10-50k variants with PASS filter, depending on what samples I'm comparing. That's lots, but I don't have any resources to add to the filtering .. perhaps that may be a bit lower if I were to use PON.

  • Thank you @vsvinti for the update.

    @Sheila Do you have any further suggestion regarding my data posted above? I am running old MuTect2 (GATK/3.8.0) with other samples and with each one I see the PASS calls but not in the newer version? May be worth sending this data to developers as test case?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @sutturka
    Hi,

    Do you have any additional suggestions for adjusting thresholds for clustered_events or artifact_in_normal?

    Sorry for the delay. I think this thread will help with clustered_events filter. For artifact in normal filter, you can try increasing the --normal-artifact-lod.

    When I extract the same positions from GATK/4.0.3.0 filtered VCF, I see >1600 of the calls have . in the FILTER column? I wonder if something wrong in assignment of PASS calls in GATK/4.0.3.0?

    Ah, this was an issue that has since been fixed :smile: Sorry for the confusion. The PASS was a . for a little bit, but it is now back to being PASS.

    -Sheila

  • Thanks @Sheila for the suggestions.

    The PASS was a . for a little bit, but it is now back to being PASS.

    So this issue has been fixed in GATK/4.0.4.0? Just wanted to make sure I am using the correct version.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @sutturka
    Hi,

    Yes, the issue is fixed in 4.0.4.0 :smile: You can read more in the release notes.

    -Sheila

Sign In or Register to comment.