We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How does UnifiedGenotyper estimate genotypes?

How does the UnifiedGenotyper estimate the genotypes in multi-sample calling? The "Multi-sample SNP calling" section of the online methods from DePristo et al (2011) mentions that the distribution of the site allele count, "Pr(q=X | D)" is calculated with a greedy search algorithm, which should also give the most likely combination of genotypes for the samples. Is this how the genotypes are decided? But it also mentions that the distribution can be calculated with the exact summation from section 4.2.2 from Heng Li's "Mathematical Notes on Samtools Algorithms". Section 4.5 of that document ("Multi-sample SNP calling and genotyping") shows how genotypes can be estimated by using the expectation of the allele count (from the exact sum-method from section 4.2.2). So is this method used instead of the greedy search? I'm curious because it seems like the samtools equations could be used to make a genotype caller that can handle samples that have variable ploidy, for example for the X chromosome, but to my knowledge nothing like that exists. Thanks!


Sign In or Register to comment.