GATK 4.0.4.0 Mutect2 - filtering multialleleic calls

mbyvcmmbyvcm Member
edited September 2018 in Ask the GATK team

I want to remove positions from my VCF which have more than one alternative allele in the ALT field. At most sites with multiple ALT calls, mutect2 has added the 'multialleleic' FILTER. However, there are some sites where this flag is not applied:

1 16959732 . T TGGGCCCGCAGCA,TGGGCCTGCAGCA . PASS
19 31611623 . GA G,AA . PASS

Ultimately I am trying filter my VCF by "AF" - which GATK VariantFiltration will not let me do if there are multiple values for AF as a result of multiple alternate alleles.

Thanks
Chris

Post edited by mbyvcm on
Tagged:

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @mbyvcm,

    I think you asked the filter-by-allele-fraction question in https://gatkforums.broadinstitute.org/gatk/discussion/12978 and I have answered that particular question.

    You are noticing Mutect2/FilterMutectCalls does not apply a multiallelic filter (explained briefly in Article#11127 for some sites listing multiple alleles. Here are some example records from hands-on tutorial example data showing the same situation--multiple ALT alleles but without the multiallelic filter.

    chr17   4539344 .       TA      T,TAA   .       artifact_in_normal;germline_risk;panel_of_normals;str_contraction       DP=44;ECNT=1;IN_PON;NLOD=-9.649e+00,2.56;N_ART_LOD=9.74,0.292;POP_AF=1.989e-04,9.200e-04;P_CONTAM=2.156e-10;P_GERMLINE=-9.948e-14,-1.024e-01;RPA=18,17,19;RU=A;STR;TLOD=3.32,9.76       GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB 0/0:5,7,1:0.444,0.166:4,3,1:1,4,0:31,31:140,120,204:60,60:22,34:false:false     0/1/2:1,2,8:0.262,0.522:1,2,5:0,0,3:32,27:92,187,144:60,60:34,19:false:false:0.707,0.707,0.727:0.021,0.034,0.945
    chr17   47157394        .       CAA     C,CAAA  .       artifact_in_normal;germline_risk;panel_of_normals       DP=70;ECNT=1;IN_PON;NLOD=5.42,-3.686e+01;N_ART_LOD=-9.733e-02,37.79;POP_AF=0.012,6.122e-03;P_CONTAM=5.811e-16;P_GERMLINE=-1.096e+00,0.00;RPA=13,11,14;RU=A;STR;TLOD=4.72,28.57  GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB 0/0:6,1,18:0.128,0.598:1,0,12:5,1,6:29,24:177,215,149:60,60:14,20:false:false   0/1/2:4,2,13:0.202,0.526:2,1,4:2,1,9:33,23:141,109,151:60,60:17,18:false:false:0.667,0.657,0.684:0.022,0.029,0.949
    chr17   68907890        .       GA      G,GAA   .       artifact_in_normal;base_quality;germline_risk;panel_of_normals;str_contraction  DP=71;ECNT=1;IN_PON;NLOD=2.74,-9.806e+00;N_ART_LOD=0.463,10.12;POP_AF=4.296e-03,1.000e-05;P_CONTAM=4.609e-04;P_GERMLINE=-2.210e+00,-1.055e-09;RPA=13,12,14;RU=A;STR;TLOD=4.63,6.72      GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:SA_MAP_AF:SA_POST_PROB 0/0:12,1,6:0.113,0.319:8,1,2:4,0,4:12,31:171,113,173:60,60:33,17:false:false    0/1/2:24,4,6:0.172,0.222:14,1,3:10,3,3:16,19:164,151,174:60,60:20,12:false:false:0.172,0.00,0.176:0.014,0.033,0.953
    

    In each of these cases, we see three common filters: artifact_in_normal;germline_risk;panel_of_normals. And for each of the normal sample's annotations we see counts towards each of the alleles:

    GT:AD:AF
    0/0:5,7,1:0.444,0.166
    0/0:6,1,18:0.128,0.598
    0/0:12,1,6
    

    compare to the tumor:

    0/1/2:1,2,8:0.262,0.522
    0/1/2:4,2,13:0.202,0.526
    0/1/2:24,4,6:0.172,0.222
    

    For each case, the three alleles are present in both the normal and the tumor. It appears these variant sites are likely artifactual and are filtered appropriately. What likely differentiates them from those sites with the multiallelic filter (also present in the same callset) is passing the tumor LOD threshold. A bit digressive but you may be interested in Blog#11315's discussion of the multiallelic filter.

    I hope this is helpful.

Sign In or Register to comment.