# Heterozygosity

### Heterozygosity in population genetics

In the context of population genetics, heterozygosity can refer to the fraction of individuals in a given population that are heterozygous at a given locus, or the fraction of loci that are heterozygous in an individual. See the Wikipedia entries on Heterozygosity and Coalescent Theory as well as the book "Population Genetics: A Concise Guide" by John H. Gillespie for further details on related theory.

### Heterozygosity in GATK

In GATK genotyping, we use an "expected heterozygosity" value to compute the prior probability that a locus is non-reference. Given the expected heterozygosity `hets`

, we calculate the probability of N samples being hom-ref at a site as `1 - sum_i_2N (hets / i)`

. The default value provided for humans is `hets = 1e-3`

; a value of 0.001 implies that two randomly chosen chromosomes from the population of organisms would differ from each other at a rate of 1 in 1000 bp. In this context `hets`

is analogous to the parameter `theta`

from population genetics. The `hets`

parameter value can be modified if desired.

Note that this quantity has nothing to do with the likelihood of any given sample having a heterozygous genotype, which in the GATK is purely determined by the probability of the observed data P(D | AB) under the model that there may be an AB heterozygous genotype. The posterior probability of this AB genotype would use the `hets`

prior, but the GATK only uses this posterior probability in determining the probability that a site is polymorphic. So changing the `hets`

parameters only increases the chance that a site will be called non-reference across all samples, but doesn't actually change the output genotype likelihoods at all, as these aren't *posterior* probabilities. The one quantity that changes whether the GATK considers the possibility of a heterozygous genotype at all is the *ploidy*, which describes how many copies of each chromosome each individual in the species carries.