Caller input_prior option

magicDGSmagicDGS Member
edited July 2015 in Ask the GATK team

I'm using HaplotypeCaller (but it could be also possible to use this option with UnifiedGenotyper) for a very special experimental design in a no-human species, where we have an expectation for the prior probabilities of each genotype. I'm planning to call SNPs for single diploid individuals using HaplotypeCaller and afterwards for the whole dataset with GenotypeGVCFs.

Nevertheless, I'm confused about the structure of the prior probabilities command line. In the documentation, it says: "Values will be added and Pr(AC=0) will be 1-sum, so that they sum up to one. f) If user-defined values add to more than one, an error will be produced". So I'll require to provide two prior probabilities out of the 3 for each genotype (0/0, 0/1 and 1/1). My first guess is that the prior that I don't need to provide is for the reference homozygous (0/0) based on the Pr(AC=0) specified in the documentation. I would like to know if this idea is correct.

My second problem if is the two input_prior options are positional parameters. If so, and if my first guess for the Pr(AC=0) is correct, do they represent the probability of 0/1 and 1/1, that is, Pr(AC=1) and Pr(AC=2)?

More concretely, I'm going to provide an example where you don't expect any heterozygous call. In that case, is it correct that the argument will be --input_prior 0.5 --input_prior 0?

Thank you very much.

Best Answers

Answers

  • magicDGSmagicDGS Member

    And if I use the GenotypeGVCFs with the prior, will it change the genotypes in the correct way?

    Thank you very much, Sheila.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @magicDGS
    Hi,

    Unfortunately, GenotypeGVCFs ignores the -inputPrior as well. I will try to get one of the developers to fix it asap.

    Thanks for being patient.

    -Sheila

  • FerFer AustriaMember
    edited October 2015

    Hi,
    I ran into this thread cause I have a similar concern. Is -input_prior working in GenotypeGVCFs now?
    Cheers

  • Hi Sheila,

    I am trying to run GenotypeGVCFs with a flat prior distribution. I have 36 individuals to genotype from gVCF's so specified 1/(72+1) 72 times, i. e. -inputPrior 0.0136986301369863 specified 72 times on the command line. However, I get the following error message:

    ERROR MESSAGE: Invalid command line: Argument inputPrior has a bad value: Invalid length of inputPrior vector: vector length must be equal to # samples +1

    Shouldn't this vector length be 2 * # samples, one value for every possible AC greater 0 ?! At least that is how I understand the description of this option:

    If user wants completely flat priors, then user should specify the same value (=1/(2N+1)) 2N times,e.g. -inputPrior 0.33 -inputPrior 0.33 for the single-sample diploid case.

    many thanks for your help

    claudius

    Issue · Github
    by Sheila

    Issue Number
    1448
    State
    open
    Last Updated
    Assignee
    Array
    Milestone
    Array
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @claudiusk
    Hi Claudius,

    Can you please post the exact command you ran? I think the issue may occur when one of the samples has a no-call. Can you check if that is the case?

    Thanks,
    Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @claudiusk, you were right about the documentation error; that will be fixed by the next release. We still think there's a bug in handling no-calls, but we've got a test case for debugging so we don't need anything more from you at this point. Thanks for reporting this.

Sign In or Register to comment.