# Explanation of strand bias detected in GATK

Member Posts: 9

Hi,

I have seen the definition of strand bias on this site (below) but I need a little clarification. Does the FS filter (a) highlight instances where reads are only present on a single strand and contain a variant (as may occur toward the end of exome capture regions) or does it (b) specifically look for instances where there are reads on both strands but the variant allele is disproportionately represented on one strand (as might be indicative of a false positive), or does it (c) do both?

I had thought it did (b) but have encountered some disagreement.

** How much evidence is there for Strand Bias (the variation being seen on only the forward or only the reverse strand) in the reads? Higher SB values denote more bias (and therefore are more likely to indicate false positive calls.

Tagged:

(b) is correct. Note that Strand Bias annotation is not the same as Fisher Strand, so you cannot make assumptions from one to the other.

Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

• Member Posts: 9

So for the older SB, would the answer (a,b or c) be different?

Theoretically, it is (b). But we've stopped using the SB annotation because we never get goods results with it...

• Member Posts: 9

So (final point) FS is a filter that I can use with exome data? Again, I believed so but have encountered differing opinions.

Yes, please see our best practices documentation.

• Member Posts: 9

Had done but needed the added security of a human telling me it

Thanks a lot for all the help.

That's assuming @ebanks is human

Geraldine Van der Auwera, PhD

• Member Posts: 9

Cyborgs are fine too!

• Member Posts: 10

@ebanks said:
Theoretically, it is (b). But we've stopped using the SB annotation because we never get goods results with it...

Could you describe the problems of SB more detail? Is it related to false positive or false negative? I think SB was able to eliminate a lot of strand biased SNPs. I was really surprised to see so many seemingly false SNPs called by a new GATK. I created vcfs from both a newer (without SB) and an older (with SB). It seems the newer version is calling SNPs whose bases are only supported by 1 strand, and these SNPs are near the indels.
Thank you.

• Member Posts: 17

Hello,

I to myself have been wondering how FS is the same as SB. I would actually like to filter out the SNPs that have been called in my vcf file from Unified Genotyper that are strand bias. Would this been done using the FS. If so what would be the cut off value of SNPs that would NOT be stand bias. I have searched a lot myself regarding this but I have had no luck.

Thanks,
Sinan