We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

how to tell whether a variant is a sequence error or real?

I want to know whether there are some references talking about how gatk tell a variant is an error or real ?in both germline and mutect2 call


  • 29043594952904359495 Member ✭✭

    can anyone help?

  • 29043594952904359495 Member ✭✭

    Thanks a lot. here I am not talking about the doc, I am saying once you said we can use MBQ and AD to calculate the error reads .

    for example, a site like following.

    chr20 57484460 . TG T . PASS DP=236;ECNT=2;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-5.988e+01;RPA=2,1;RU=G;STR;TLOD=8.76 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:230,6:0.029:236:230,6:0,0:28,30:171,171:60:51:false:false:0:0.020,0.010,0.025:3.382e-03,5.781e-03,0.991

    so MBQ for ref and alt is 28 and 30, about 0.001 error rate, 236*0.001 = 0.236, much lower than 6, so it passed, you ever answered a question likie this

  • davidbendavidben BostonMember, Broadie, Dev ✭✭✭

    To be clear, the algorithm of Mutect2 is much more sophisticated than this. However, back-of-the-envelope calculations such as the above can still give a rough sense of things. To understand what Mutect2 actually does one has to read the documentation.

  • 29043594952904359495 Member ✭✭
    edited September 2019

    the doc is really too difficult to understand, if there can be some concrete examples,such as a real variant site, to explain how muetct2 calculate it out, it will be much more friendly to us common people.
    thanks a [email protected]

    for example, site like following
    chr18 48593530 . TA T . clustered_events;t_lod DP=234;ECNT=5;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-5.263e+01;RPA=2,1;RU=A;STR;TLOD=3.15 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:191,3:0.020:194:191,3:0,0:22,26:171,171:60:16:false:false:0:0.020,0.00,0.015:0.013,3.253e-03,0.984

    I know tumor-lod (default threshold 5.3) , here is 3.15, so give the t_lod, but I am concerned is how this site distinguish sequence error from a real somatic variant.
    here alt base quality is 26,so error rate is 0.0025, 194*0.0025 = 0.48, also seems ok.
    so can you analyse this site through this method

Sign In or Register to comment.