[HARD FILTER QUESTION]
I am calling SNPs (Haplotype Caller GATK3) in a sample of 70 low-coverage (3-5x) genome of a non-model organism. When I plot the distribution of QD, I get a really odd distribution (see attached; please remove space from link) (h t t p s://us.v-cdn.net/5019796/uploads/editor/lj/cepi62d878al.png). I'm uncertain regarding which filter value I should choose (2 seems not stringent enough). Do you have some advice?
P.S. this is prior to BQSR.
Thanks for the help.
Because this is a low coverage data we do expect to see this kind of a distribution. The decision about choosing a filter value is more of a judgment call and also a trial and error method. You want to remove as many bad reads as possible while not losing out on data. You should try to make the QD value more stringent( maybe around 4) and see how many reads that filters out, and then make a call based on that.
Without looking at the data myself that is unfortunately all i can suggest.