Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Is PL actually a probability ratio not a likelihood? And consequences for GQ
I read the methods section Math notes: how PL is calculated in HaplotypeCaller which says that PL is based on the probability of the genotype given the data. Does this mean that it includes the product of the genotype likelihood and the prior probability of the genotype and therefore is it actually the (unnormalised) posterior probability not a likelihood? (I realise that the prior is by default flat but that it can be altered).
It then describes GQ as the ratio of the probability of the second-most probable genotype to the called genotype (if these are probabilities). Can you please explain how this equates to the probability that the genotype as been wrongly called given that the site is variant (from the VCF format specification). It doesn't take into account the probabilities of other possible genotypes and I don't understand how it is conditional on the site being variant. As an example, what about when the second-most probable genotype is homozygous reference.
Thanks for any help in understanding this - I am teaching it to students so what to check my understanding