**Heads up:**

We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.

**Notice:**

If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

#### Test-drive the GATK tools and Best Practices pipelines on Terra

**Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.**

# Heterozygosity

### Heterozygosity in population genetics

In the context of population genetics, heterozygosity can refer to the fraction of individuals in a given population that are heterozygous at a given locus, or the fraction of loci that are heterozygous in an individual. See the Wikipedia entries on Heterozygosity and Coalescent Theory as well as the book "Population Genetics: A Concise Guide" by John H. Gillespie for further details on related theory.

### Heterozygosity in GATK

In GATK genotyping, we use an "expected heterozygosity" value to compute the prior probability that a locus is non-reference. Given the expected heterozygosity `hets`

, we calculate the probability of N samples being hom-ref at a site as `1 - sum_i_2N (hets / i)`

. The default value provided for humans is `hets = 1e-3`

; a value of 0.001 implies that two randomly chosen chromosomes from the population of organisms would differ from each other at a rate of 1 in 1000 bp. In this context `hets`

is analogous to the parameter `theta`

from population genetics. The `hets`

parameter value can be modified if desired.

Note that this quantity has nothing to do with the likelihood of any given sample having a heterozygous genotype, which in the GATK is purely determined by the probability of the observed data P(D | AB) under the model that there may be an AB heterozygous genotype. The posterior probability of this AB genotype would use the `hets`

prior, but the GATK only uses this posterior probability in determining the probability that a site is polymorphic. So changing the `hets`

parameters only increases the chance that a site will be called non-reference across all samples, but doesn't actually change the output genotype likelihoods at all, as these aren't *posterior* probabilities. The one quantity that changes whether the GATK considers the possibility of a heterozygous genotype at all is the *ploidy*, which describes how many copies of each chromosome each individual in the species carries.