I found 320 mutations among 16 mouse clones using GATK. However, I noticed something strange. In only 40 of those 0/1 calls are there more reads with the ALT call (1) than reads with the REF call (0). In the vast remainder, 280 mutations I found, the reads that carry the ALT calls are a smaller fraction about 10-40 percent.
If all my 320 mutations were real, I would expect the REF/ALT distributions to be about equal. If it was true and normally distributed, I'd expect at least 50% of these mutaitons to show up with more ALT calls and 50% to show up with more REF calls. Is it natural to find fewer reads with the ALT call in genuine mutations?
I was thinking about using the bionomial distribution in Excel to remove these, something like -BINOMDIST(20,100,.50,FALSE) where I found 20 ALT calls with a Depth of 100 and expected to see 50% ALT calls.