Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

CalculateGenotypePosteriors and excess heterozygotes

tmotmo Duke UniversityMember

In a partially inbreeding species we found an excess of called heterozygotes (a peak of observed/expected heterozygosity = 1.0). Is this because of CalculateGenotypePosteriors? Can we turn this off? Or better yet, can one set a prior for called heterozygosity / expected heterozygosity (an expected distribution of the inbreeding coefficient)?

The histogram of observed/expected heterozygosity is attached (with the low tail truncated).

Thanks!

Best Answer

Answers

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited May 2015

    Stupid question: How do you know, what your expected heterozygosity is? It could deviate from HWE. How many samples are you dealing with? Is this low coverage data?

  • tmotmo Duke UniversityMember

    We know from previous studies that this species has an inbreeding coefficient around 0.9 (observed/expected heterozygosity = 0.1), so we expect most individuals will be fairly homozygous. This is genotyping by sequencing data with fairly high coverage, although individuals may have light coverage at a particular locus (192 individuals GBSed in 1 lane). However, more conservative filtering would bring its own biases.

    Given this level of inbreeding, we are cautious about inferred heterozygotes, and it is implausible to see many loci with inferred/expected heterozygosity near 1.0. The excess of inferred heterozygotes near Hardy-Weinberg frequency suggests some Hardy-Weinberg expectation in genotype calls, but I have been unable to locate that in the documentation.

    Finally, to correct my original question, CalculateGenotypePosteriors was not used in these analyses.

    Thanks

Sign In or Register to comment.