We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Threshold for some items

omidomid Member
edited August 2012 in Ask the GATK team

To filter exome sequence data and remove false positive I know read depth and Phred score are routinely applied.

But there are following items (related to quality) which I would like to know is there any threshold/cut off for them? and In which step of filtration strategy I should apply them(at the beginning or at the end)?

GC = GC content within 20 bp +/- the variant

HRun = Largest Contiguous Homopolymer Run of Variant Allele In Either Direction

HW = Phred-scaled p-value for Hardy-Weinberg violation. Extreme variations on heterozygous calls indicate a false positive call


**MQ0Fraction **= RMS (Root Mean Square, also known as quadratic mean) Mapping Quality. Regions of excessively low mapping quality are ambiguously mapped and variants called within are suspicious

**SB **= Strand Bias

**BaseQualityRankSumTest **= The u-based z-approximation from the Mann-Whitney Rank Sum Test for base qualities (ref bases vs.bases of the alternate allele).


Sign In or Register to comment.