If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

how to tell whether a variant is a sequence error or real?

I want to know whether there are some references talking about how gatk tell a variant is an error or real ?in both germline and mutect2 call


  • can anyone help?

  • Thanks a lot. here I am not talking about the doc, I am saying once you said we can use MBQ and AD to calculate the error reads .

    for example, a site like following.

    chr20 57484460 . TG T . PASS DP=236;ECNT=2;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-5.988e+01;RPA=2,1;RU=G;STR;TLOD=8.76 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:230,6:0.029:236:230,6:0,0:28,30:171,171:60:51:false:false:0:0.020,0.010,0.025:3.382e-03,5.781e-03,0.991

    so MBQ for ref and alt is 28 and 30, about 0.001 error rate, 236*0.001 = 0.236, much lower than 6, so it passed, you ever answered a question likie this

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    To be clear, the algorithm of Mutect2 is much more sophisticated than this. However, back-of-the-envelope calculations such as the above can still give a rough sense of things. To understand what Mutect2 actually does one has to read the documentation.

  • 29043594952904359495 Member
    edited September 16

    the doc is really too difficult to understand, if there can be some concrete examples,such as a real variant site, to explain how muetct2 calculate it out, it will be much more friendly to us common people.
    thanks a [email protected]

    for example, site like following
    chr18 48593530 . TA T . clustered_events;t_lod DP=234;ECNT=5;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-5.263e+01;RPA=2,1;RU=A;STR;TLOD=3.15 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:191,3:0.020:194:191,3:0,0:22,26:171,171:60:16:false:false:0:0.020,0.00,0.015:0.013,3.253e-03,0.984

    I know tumor-lod (default threshold 5.3) , here is 3.15, so give the t_lod, but I am concerned is how this site distinguish sequence error from a real somatic variant.
    here alt base quality is 26,so error rate is 0.0025, 194*0.0025 = 0.48, also seems ok.
    so can you analyse this site through this method

Sign In or Register to comment.