If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!
Testdrive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a userfriendly interface!) without having to install anything.
Inference of genotype likelihoods for lower ploidy based on genotyping at higher ploidy using GATK
Let's say I have a bunch of mixed ploidy individuals (with biallelic markers) in my data. Some are tetraploid and some are diploid. But I choose to run GATK HaplotypeCaller (to get genotype likelihoods) with ploidy set to 4 for all organisms since I know the highest ploidy level in the data to be 4.
My idea is to run the data and obtain genotype likelihoods with the highest resolution and then downscale those values obtained to a lower ploidy level posthoc.
For instance, given that there are 5 genotype classes/dosage levels for tetraploid organisms (0 of the reference allele, 1 of the reference, 2 of the reference, 3 of the reference and 4 of the reference), I will get 5 phredscaled scores for each locus in each individual. Each score represents the probability of having a certain count for the reference allele (0 through 4).
Now if I deduce that one of these individuals is a diploid but I've already run the analyses:
 Can I just combine the genotype likelihoods of the 3 heterozygote classes in the tetraploid call (1/3, 2/2, 3/1) to get the genotype likelihood of the one heterozygote class (1/1) in a diploid individual?
 If so, how do I do this quantitatively?
For example, at a locus in an individual that I assumed to be tetraploid during the GATK run, I get these phredscaled genotype likelihoods:
0/4 1/3 2/2 3/1 4/0
6 67 0 4 60
But I now know that this individual is diploid, so I am now looking for just 3 phredscaled genotype likelihoods instead of 5:
0/2 1/1 2/0
? ? ?
Would I keep the homozygote classes the same i.e. 6 and 60 and then just average the 3 dosage classes for the heterozygote of the diploid? Or would I perform another similar mathematical operation?
Thanks,
Vivaswat
Best Answer

shlee Cambridge ✭✭✭✭✭
Hi Vivaswat (@vshastry),
I asked a similar question of our developers about a month ago: how to backcalculate PLs for different copy number regions given PLs with certain ploidy assumption? It's great to see others planning to use higher ploidy calls to maximize genotyping the multiploidy states.
First, let me rephrase your question to conventions we are used to. You present a strictly biallelic case where ploidy can be either tetraploid or diploid. Here are the possible genotypes, where 0 = reference allele and 1 = alternate allele.
tetraploid
 0/0/0/0
 0/0/0/1
 0/0/1/1
 0/1/1/1
 1/1/1/1diploid
 0/0
 0/1
 1/1It is possible to recalculate the PLs for your diploid case but not necessarily others. Here's the catch. I am told by one of our developers that this backcalculation is only possible if you can divide the higherploidy by the lowerploidy (without remainder). In your case, 2 divides cleanly into 4.
You ask how to backcalculate. I think this would be a great exercise for you to perform yourself and report back to us because it is easy to do so! You can call on a het site (
0/1
) with the two different ploidies and see the PLs that HaplotypeCaller gives. Also, I think these three workshop slides will be helpful to you:Let us know how it goes.
Answers
To be more clear, I would like to know how GATK's HaplotypeCaller would perform an operation similar to this? Would it pool the genotype likelihoods of the heterozygotes from a higher ploidy level into one class when I run it with a ploidy set to two?
Hi Vivaswat (@vshastry),
I asked a similar question of our developers about a month ago: how to backcalculate PLs for different copy number regions given PLs with certain ploidy assumption? It's great to see others planning to use higher ploidy calls to maximize genotyping the multiploidy states.
First, let me rephrase your question to conventions we are used to. You present a strictly biallelic case where ploidy can be either tetraploid or diploid. Here are the possible genotypes, where 0 = reference allele and 1 = alternate allele.
tetraploid
 0/0/0/0
 0/0/0/1
 0/0/1/1
 0/1/1/1
 1/1/1/1
diploid
 0/0
 0/1
 1/1
It is possible to recalculate the PLs for your diploid case but not necessarily others. Here's the catch. I am told by one of our developers that this backcalculation is only possible if you can divide the higherploidy by the lowerploidy (without remainder). In your case, 2 divides cleanly into 4.
You ask how to backcalculate. I think this would be a great exercise for you to perform yourself and report back to us because it is easy to do so! You can call on a het site (
0/1
) with the two different ploidies and see the PLs that HaplotypeCaller gives. Also, I think these three workshop slides will be helpful to you:Let us know how it goes.