The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

Is PL actually a probability ratio not a likelihood? And consequences for GQ

Lucy_genLucy_gen SheffieldMember Posts: 6

I read the methods section Math notes: how PL is calculated in HaplotypeCaller which says that PL is based on the probability of the genotype given the data. Does this mean that it includes the product of the genotype likelihood and the prior probability of the genotype and therefore is it actually the (unnormalised) posterior probability not a likelihood? (I realise that the prior is by default flat but that it can be altered).

It then describes GQ as the ratio of the probability of the second-most probable genotype to the called genotype (if these are probabilities). Can you please explain how this equates to the probability that the genotype as been wrongly called given that the site is variant (from the VCF format specification). It doesn't take into account the probabilities of other possible genotypes and I don't understand how it is conditional on the site being variant. As an example, what about when the second-most probable genotype is homozygous reference.

Thanks for any help in understanding this - I am teaching it to students so what to check my understanding

Best Answer

Answers

  • Lucy_genLucy_gen SheffieldMember Posts: 6

    Thank you for your answer shlee. You didn't answer the part about whether GQ is how it is defined in the VCF format. I don't see how it is conditioned on the site being variant. Could you please look at that part again?

  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator, Dev Posts: 422 admin

    To clarify, here's a slide from a HaplotypeCaller presentation that uses toy values to simply illustrate the calculations:

    image

    We log 10 transform the probability and multiply by -10 to obtain raw PLs (middle row). We then subtract the smallest PL from each raw PL so that the most likely genotype's PL is zero (last row). The distance to the next most likely genotype is the next most likely PL. The genotype quality (GQ) captures this distance. That is, the GQ is the PL of the next most likely, capped at 99.

    Remember @Lucy_gen , low GQ does not necessarily mean a bad variant call. You can have a good variant call with a low GQ. That is, we can be sure a site is not hom-ref, but not be sure whether it is het or hom-var.

Sign In or Register to comment.