Apparent true SNPs with allele inbalance pass all hard filters except QD
I have been working on fine tuning hard filtering for a set of 100 samples where we are capturing a very small target (500k bp).
Through this fine tuning it has become clear to me that QD is probably the strongest filter, however it has a cost and that is that if one sets the limit at QD < 2, one filters out a non-negligeable number of SNPs for which there is strong evidence that they are true.
I have taken a look at these SNPs to try to see why they are failing the QD filter.
Almost all of these SNPs are in heteroz genotypes and have a strong allele imbalance, that is the alternative allele is supported by quite a bit less than 50% of the covering bases. This explains why they fail QD: they have decent QUAL (although not great) due to the allele imbalance, but they have excellent coverage and thus get highly penalised by the depth correction. However, all other filters (FS, HaplotypeScore, ReadPosRankSum, BaseQRankSum, MQRankSum, etc) are passed. In addition, several of these SNPs are in curated dbs of variants. Finally, some of these SNPs are in multiple samples and the imbalance between alleles is similar in different samples at the same site.
In summary, these variants appear to be true and are characterised by allele imbalance. But what is it that is causing this allele imbalance? How comes we sequence more of one allele than the other?