calculating sigma for Depth of Coverage cutoffs
I would like to filter variants based on max DP. I understand that in order to define the max DP cutoof there is a need to calculate the Sigma, as stated in GATK's best practices:
"The maximum DP (depth) filter only applies to whole genome data, where the probability of a site having exactly N reads given an average coverage of M is a well-behaved function. First principles suggest this should be a binomial sampling but in practice it is more a Gaussian distribution. Regardless, the DP threshold should be set a 5 or 6 sigma from the mean coverage across all samples, so that the DP > X threshold eliminates sites with excessive coverage caused by alignment artifacts."
Hence, how can one calculate the sigma, and what is it exactly?