We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

--input_prior (UG)

I'm a bit confused about the instructions for --input_prior in UnifiedGenotyper but they seem quite useful. Is there any chance you could clarify or simplify ?

--input_prior / -inputPrior
Input prior for calls
By default, the prior specified with the argument --heterozygosity/-hets is used for variant discovery at a particular locus, using an infinite sites model, see e.g. Waterson (1975) or Tajima (1996). This model asserts that the probability of having a population of k variant sites in N chromosomes is proportional to theta/k, for 1=1:N There are instances where using this prior might not be desireable, e.g. for population studies where prior might not be appropriate, as for example when the ancestral status of the reference allele is not known. By using this argument, user can manually specify priors to be used for calling as a vector for doubles, with the following restriciotns: a) User must specify 2N values, where N is the number of samples. b) Only diploid calls supported. c) Probability values are specified in double format, in linear space. d) No negative values allowed. e) Values will be added and Pr(AC=0) will be 1-sum, so that they sum up to one. f) If user-defined values add to more than one, an error will be produced. If user wants completely flat priors, then user should specify the same value (=1/(2N+1)) 2N times,e.g. -inputPrior 0.33 -inputPrior 0.33 for the single-sample diploid case.



  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    Sorry for the late response. I am working on an FAQ article to address this. One of my colleagues, Laura @gauthier did some research and helped us understand this more. However, the short answer for now is that you should only use this argument under direct supervision of a qualified population geneticist. It is an advanced option for very unusual samples or populations where the typical mutation model doesn't apply.


  • BlueBlue Member

    No problem at all with with the late response.

    In practical terms, my sequenced sample consists of 200 diploid individuals that share the same father but all have different mothers. I'm only really interesting in the genetic variation from the mothers. SNPs are expected to be breaking HWE a lot.

Sign In or Register to comment.