calculating sigma for Depth of Coverage cutoffs
Hi,
I would like to filter variants based on max DP. I understand that in order to define the max DP cutoof there is a need to calculate the Sigma, as stated in GATK's best practices:
"The maximum DP (depth) filter only applies to whole genome data, where the probability of a site having exactly N reads given an average coverage of M is a wellbehaved function. First principles suggest this should be a binomial sampling but in practice it is more a Gaussian distribution. Regardless, the DP threshold should be set a 5 or 6 sigma from the mean coverage across all samples, so that the DP > X threshold eliminates sites with excessive coverage caused by alignment artifacts."
Hence, how can one calculate the sigma, and what is it exactly?
Thanks!
Sagi
Best Answer

Geraldine_VdAuwera Cambridge, MA admin
That sounds right to me. As always, make sure to examine your results carefully to make sure that this cutoff is appropriate for your data.
Answers
http://en.wikipedia.org/wiki/Standard_deviation
Thanks!
What tool would you recommend for this?
OK, so I found a nice tool for this: QualiMao (http://qualimap.bioinfo.cipf.es/).
Now, just to be sure, if I have a mean genome coverage of ~10X and a standard deviation (sigma) of 110, so does that mean that the max depth cutoff for variant calling should be: DP(max) = 10 + 5*110 = 560?
I'd appreciate if you can confirm this.
Thanks!
Sagi
That sounds right to me. As always, make sure to examine your results carefully to make sure that this cutoff is appropriate for your data.
Thanks Geraldine!
Helpful as always
Best,
Sagi
I try
Thanks for linking to that tool, looks like it could be useful to others. From what we can tell it's essentially a visualization engine for Picard tools, so you could get all those metrics using Picard from command line. But sometimes it is nice to have everything all plotted out for you!
Yes it's pretty neat. Even Though I've seen that there are some inconsistencies between the results generated via qualimap and those generated via samtools depth. I've addressed this issue via qualimap's developers and I am awaiting a reply. Once I get an answer I'll post it back here just in case so that everyone may benefit from it.
Best,
Sagi
@sagipolani
Hi,
I tried Qualimap using multisample mode. I can get the mean coverage for samples but what about the standard deviations for all samples? I also generated individual sample statistics using Qualimap and got sd for each sample. But I'm wondering how to set a Max DP in order to filter my multisample vcf?
Thanks
Sumudu