We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

#### Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

# --input-prior default value?

United KingdomMember ✭✭✭

#### Issue · Github January 2017 by Sheila

Issue Number
1595
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Alright, never mind anything I said above -- after a bit of digging it's a bit more complicated than I thought, and not a version/model related thing. The priors you quote from the article are derived from the SNP (or indel) heterozygosity (each a scalar) using a pre-determined equation. If you don't want to follow the pre-determined equation, then you use the `input_prior` argument, which according to the dev I got this from should override the `heterozygosity` argument. There's some notes about how heterozygosity is used here.

• United KingdomMember ✭✭✭
edited January 2017

Thanks a lot for this @Geraldine_VdAuwera! It makes sense. Your current documentation suggests to the reader, that the priors are not flat:

For completely flat priors, specify the same value (=1/(2N+1)) 2N times, e.g. -inputPrior 0.33 -inputPrior 0.33 for the single-sample diploid case.

Ho hmm that's a fair point. I have a Q30 confidence that they are in fact flat, but will check and clarify the doc.

• United KingdomMember ✭✭✭
edited January 2017

Q30 Not quite sure what that translates to on the VdA confidence scale

I'm violating the copyright laws by pasting here from the supplementary to the 2016 Nature paper "The Simons Genome Diversity Project: 300 genomes from 142 diverse populations" by Mallick, et.al., which seems to suggest the priors are not flat by default:

http://www.nature.com/nature/journal/v538/n7624/extref/nature18964-s1.pdf

Most analyses in this paper are based on single-sample genotypes determined using a reference-bias free modification of GATK. We did not perform multi-sample genotyping as we were concerned that this could induce biases in population genetic analyses. Specifically, we were concerned that the GATK UnifiedGenotyper has a built-in prior for Bayesian SNP calling that assumes that the site is more likely to be homozygous for the reference allele than homozygous for the variant allele. For a diploid sample, the default priors for a homozygous reference, heterozygote and homozygous non-reference genotypes are (0.9985, 0.001, 0.0005), respectively. When there is ambiguity in a heterozygote, GATK prefers the reference homozygote. This is a reference bias, and while this bias is not typically problematic for medical studies, it can complicate interpretation of population genetics signals. With the Genome Sequencing and Analysis Group at the Broad Institute, we developed an alternative model that was integrated into the UnifiedGenotyper, allowing reference-bias free priors to be specified. We are using a prior (0.4995, 0.001, 0.4995). Details are at: https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php#--input_prior.

Alright, never mind anything I said above -- after a bit of digging it's a bit more complicated than I thought, and not a version/model related thing. The priors you quote from the article are derived from the SNP (or indel) heterozygosity (each a scalar) using a pre-determined equation. If you don't want to follow the pre-determined equation, then you use the `input_prior` argument, which according to the dev I got this from should override the `heterozygosity` argument. There's some notes about how heterozygosity is used here.