Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

#### Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

# re-scaling genotype likelihoods

Member

Greetings,

I am trying to incorporate genotype likelihoods into a downstream analysis. I have two questions:

1) Why is the most likely genotype scaled to a Phred score of zero?

2) Is there a way to undo the scaling? I have seen downstream tools undo the scaling, but I don't know how they do it. Is there an equation that will return an estimated genotype likelihood from the scaled genotype likelihoods?

Zev Kronenberg

Tagged:

Hi Zev,

1) This is just a normalization (not a scaling) and does not affect the actual posterior probabilities at all. This isn't the appropriate forum to go over the mathematical rationale though so you'll either need to take my word for it or ask for an explanation on somewhere like seqanswers.
2) There is no need to undo the normalization and I cannot imagine that any downstream tools are actually doing this (again see #1). The likelihoods in the VCFs are not "scaled" or "estimated" and should be taken as accurate representations of the data.

Hope that helps!

• Member

I am going to try and clarify my question:

I completely trust the genotype calculations, but I am still having trouble incorporating PL into a population genetics measure. My problem is the normalization:

The normalization sets the most likely genotype to a phred scaled likelihood of 0 / a p-value of 1.

"Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification"

"The most likely genotype (given in the GT field) is scaled so that it's P = 1.0 (0 when Phred-scaled), and the other likelihoods reflect their Phred-scaled likelihoods relative to this most likely genotype."

So in the case of a terrible het call the genotype likelihoods will be something like (2, 0, 1). AA AB BB.

The problem is assessing the uncertainly of the het call with a p-value of 1 / phred score of zero.

When I integrate over the other genotypes AA & AB I am concerned I am introducing a bias.

Maybe I don't need to worry about it. I just noticed that other tools, like BEAGLE, that use GATK VCFs, have a modified PL where the most likely genotype is not required to have a phred score of zero.

Thanks.

• Member

I think the easiest way around this is:?

phred / sum(phreds)

that will somewhat undo the normalization.