**We've moved!**

This site is now read-only. You can find our new documentation site and support forum for posting questions here.

Be sure to read our welcome blog!

# how to tell whether a variant is a sequence error or real?

2904359495
Member ✭✭

This site is now read-only. You can find our new documentation site and support forum for posting questions here.

Be sure to read our welcome blog!

2904359495
Member ✭✭

## Answers

can anyone help?

@2904359495 M2 documentation: https://github.com/broadinstitute/gatk/blob/master/docs/mutect/mutect.pdf

Thanks a lot. here I am not talking about the doc, I am saying once you said we can use MBQ and AD to calculate the error reads .

for example, a site like following.

`chr20 57484460 . TG T . PASS DP=236;ECNT=2;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-5.988e+01;RPA=2,1;RU=G;STR;TLOD=8.76 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:230,6:0.029:236:230,6:0,0:28,30:171,171:60:51:false:false:0:0.020,0.010,0.025:3.382e-03,5.781e-03,0.991`

so MBQ for ref and alt is 28 and 30, about 0.001 error rate, 236*0.001 = 0.236, much lower than 6, so it passed, you ever answered a question likie this

To be clear, the algorithm of Mutect2 is much more sophisticated than this. However, back-of-the-envelope calculations such as the above can still give a rough sense of things. To understand what Mutect2 actually does one has to read the documentation.

the doc is really too difficult to understand, if there can be some concrete examples,such as a real variant site, to explain how muetct2 calculate it out, it will be much more friendly to us common people.

thanks a [email protected]

for example, site like following

`chr18 48593530 . TA T . clustered_events;t_lod DP=234;ECNT=5;POP_AF=5.000e-08;P_CONTAM=0.00;P_GERMLINE=-5.263e+01;RPA=2,1;RU=A;STR;TLOD=3.15 GT:AD:AF:DP:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:OBAM:OBAMRC:ORIGINAL_CONTIG_MISMATCH:SA_MAP_AF:SA_POST_PROB 0/1:191,3:0.020:194:191,3:0,0:22,26:171,171:60:16:false:false:0:0.020,0.00,0.015:0.013,3.253e-03,0.984`

I know tumor-lod (default threshold 5.3) , here is 3.15, so give the t_lod, but I am concerned is how this site distinguish sequence error from a real somatic variant.

here alt base quality is 26,so error rate is 0.0025, 194*0.0025 = 0.48, also seems ok.

so can you analyse this site through this method