To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Allele Distributions


I found 320 mutations among 16 mouse clones using GATK. However, I noticed something strange. In only 40 of those 0/1 calls are there more reads with the ALT call (1) than reads with the REF call (0). In the vast remainder, 280 mutations I found, the reads that carry the ALT calls are a smaller fraction about 10-40 percent.

If all my 320 mutations were real, I would expect the REF/ALT distributions to be about equal. If it was true and normally distributed, I'd expect at least 50% of these mutaitons to show up with more ALT calls and 50% to show up with more REF calls. Is it natural to find fewer reads with the ALT call in genuine mutations?

I was thinking about using the bionomial distribution in Excel to remove these, something like -BINOMDIST(20,100,.50,FALSE) where I found 20 ALT calls with a Depth of 100 and expected to see 50% ALT calls.


Best Answer


Sign In or Register to comment.