We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

SB flag in vcf file

mayaabmayaab IsraelMember ✭✭

again, thanks a lot for the amazing workshop in Brussels!
I have a question on dealing with strand bias, regarding the SB flag in the vcf file:
The value of SB (strand bias) is calculated by Fisher exact test, using a 2X2 table that contains the reference, non-reference, fwd and reverse depths. Playing a little with the numbers given to Fisher exact test through web calculator I noticed that combinations which seem as clear strand bias receive non-significant value (e.g. 30,1,110,2 for ref-fwd, ref-reverse, non-ref-fwd, non-ref-reverse receive p-value of 0.52 or 2.84 when phred-scaled).
Such variant are therefore considered as unbiased.
The cases that are defined as bias are ones where 3 out of the 4 values are similar to each and only one is extremely different.
As far as I understand, cases as the one I mentioned should be referred as biased. Do you recommend using this strand bias value and filter variants based on it?




  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hi Maya,
    Why do you believe that the 30,1,110,2 case is biased? It looks completely unbiased to me.

  • mayaabmayaab IsraelMember ✭✭

    thanks for the answer. If that so, there is something I don't understand - as far as I know, a strand bias is a case when a position is covered significantly much more by one strand than the other. Isn't it?

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Hi Maya,

    The reasoning here is that since the reference has bias, there may be something funny about the locus in the genome (maybe it's very repetitive and the mapping is a little off so a bunch of the + reads get mapped correctly, but then the - reads get mapped somewhere else that looks similar.) We assume that the reference bias is okay because it's the reference and we can't do anything to correct it. So, if the alt has the same bias, it's most likely because the same mechanism is at work that caused the reference bias.


  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Also, your understanding of strand bias is not quite right. It's not that one strand is covered more than the other. Rather, bias occurs when the strands are covered differentially for reference and alternate alleles. In the case you brought above, the reference and alternate rows in the table are equally skewed, so that's not a case of strand bias.

  • mayaabmayaab IsraelMember ✭✭

    thanks a lot! now I understand much better

Sign In or Register to comment.