We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Inbreeding coefficient calculation documentation available?

Is there a description available for the inbreeding coefficient calculation used for variant recalibration?
An overview is found here:
https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_annotator_InbreedingCoeff.php
and that page points to here for the details for the method:
document on statistical tests
but there is no description of the inbreeding coefficient (just the rank sum test).
Best Answer
-
Sheila Broad Institute admin
@jfarrell
Hi,It is not really that robust. We have found that relatedness does break down the assumptions inbreeding coefficient is based on. For family samples, it really depends on how many families and samples you have. For example, if you have 3 families, inbreeding coefficient is not going to work. But, if you have 10,000 samples and just a few families, it should be fine.
-Sheila
Answers
@jfarrell
Hi,
I did not get the chance to finish up the statistics documentation for the annotations. I will look into inbreeding coefficient and get back to you.
-Sheila
@jfarrell
Hi,
From @gauthier:
The InbreedingCoeff is 1-(# observed hets)/(# expected hets), where we estimate the population allele frequency from the sample genotypes. Number of expected hets comes from the random mating assumption and the proportion of ref and alt alleles in the population, so it's just 2Prob(ref from parent1)Prob(alt from parent1) = 2pq. (Two is for two outcomes -- alt from mom or alt from dad.) Negative values of InbreedingCoeff mean we have too many hets and suggest a site with bad mapping, which is why we filter out variants with the most negative InbreedingCoeffs for ExAC. (Positive values of IC could arise from admixture of different ethnic populations as in ExAC, e.g. Finns are all hom var but Taiwanese are all hom ref.)
Also, you can refer to this paper another developer pointed out: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1199378/
I will fix the documentation asap.
-Sheila
Thanks! How robust is this filter to the assumption of being unrelated? Would it still work for a large number of family samples? Or should it be best calculated based only on the unrelated individuals in the sequenced sample.
@jfarrell
Hi,
It is not really that robust. We have found that relatedness does break down the assumptions inbreeding coefficient is based on. For family samples, it really depends on how many families and samples you have. For example, if you have 3 families, inbreeding coefficient is not going to work. But, if you have 10,000 samples and just a few families, it should be fine.
-Sheila