Threshold for some items

omid
edited August 2012

To filter exome sequence data and remove false positive I know read depth and Phred score are routinely applied.

But there are following items (related to quality) which I would like to know is there any threshold/cut off for them? and In which step of filtration strategy I should apply them(at the beginning or at the end)?

GC = GC content within 20 bp +/- the variant

HRun = Largest Contiguous Homopolymer Run of Variant Allele In Either Direction

HW = Phred-scaled p-value for Hardy-Weinberg violation. Extreme variations on heterozygous calls indicate a false positive call


**MQ0Fraction **= RMS (Root Mean Square, also known as quadratic mean) Mapping Quality. Regions of excessively low mapping quality are ambiguously mapped and variants called within are suspicious

**SB **= Strand Bias

**BaseQualityRankSumTest **= The u-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities (ref bases vs.bases of the alternate allele).


