Caller input_prior option
I'm using HaplotypeCaller (but it could be also possible to use this option with UnifiedGenotyper) for a very special experimental design in a nohuman species, where we have an expectation for the prior probabilities of each genotype. I'm planning to call SNPs for single diploid individuals using HaplotypeCaller and afterwards for the whole dataset with GenotypeGVCFs.
Nevertheless, I'm confused about the structure of the prior probabilities command line. In the documentation, it says: "Values will be added and Pr(AC=0) will be 1sum, so that they sum up to one. f) If userdefined values add to more than one, an error will be produced". So I'll require to provide two prior probabilities out of the 3 for each genotype (0/0, 0/1 and 1/1). My first guess is that the prior that I don't need to provide is for the reference homozygous (0/0) based on the Pr(AC=0) specified in the documentation. I would like to know if this idea is correct.
My second problem if is the two input_prior options are positional parameters. If so, and if my first guess for the Pr(AC=0) is correct, do they represent the probability of 0/1 and 1/1, that is, Pr(AC=1) and Pr(AC=2)?
More concretely, I'm going to provide an example where you don't expect any heterozygous call. In that case, is it correct that the argument will be input_prior 0.5 input_prior 0?
Thank you very much.
Best Answers

Sheila Broad InstituteMember, Broadie, Moderator
@magicDGS
Hi,Pr(AC=0) is the one that's not explicitly given, so you would give input_prior 0 input_prior 0.5 (for AC=1 and AC=2, respectively).
However, please note the input_prior argument is not working right now. I just moved it up in priority, so hopefully it will be fixed soon. I will let you know as soon as it works.
Sheila
Answers
@magicDGS
Hi,
Pr(AC=0) is the one that's not explicitly given, so you would give input_prior 0 input_prior 0.5 (for AC=1 and AC=2, respectively).
However, please note the input_prior argument is not working right now. I just moved it up in priority, so hopefully it will be fixed soon. I will let you know as soon as it works.
Sheila
And if I use the GenotypeGVCFs with the prior, will it change the genotypes in the correct way?
Thank you very much, Sheila.
@magicDGS
Hi,
Unfortunately, GenotypeGVCFs ignores the inputPrior as well. I will try to get one of the developers to fix it asap.
Thanks for being patient.
Sheila
Hi,
I ran into this thread cause I have a similar concern. Is input_prior working in GenotypeGVCFs now?
Cheers
@Fer
Hi,
Yes! It looks like the fix was made in August
Sheila
Hi Sheila,
I am trying to run GenotypeGVCFs with a flat prior distribution. I have 36 individuals to genotype from gVCF's so specified
1/(72+1)
72 times, i. e.inputPrior 0.0136986301369863
specified 72 times on the command line. However, I get the following error message:Shouldn't this vector length be
2 * # samples
, one value for every possible AC greater 0 ?! At least that is how I understand the description of this option:many thanks for your help
claudius
Issue · Github
by Sheila
@claudiusk
Hi Claudius,
Can you please post the exact command you ran? I think the issue may occur when one of the samples has a nocall. Can you check if that is the case?
Thanks,
Sheila
Hi @claudiusk, you were right about the documentation error; that will be fixed by the next release. We still think there's a bug in handling nocalls, but we've got a test case for debugging so we don't need anything more from you at this point. Thanks for reporting this.