If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Inference of genotype likelihoods for lower ploidy based on genotyping at higher ploidy using GATK

vshastryvshastry Laramie, WyomingMember
edited October 2018 in Ask the GATK team

Let's say I have a bunch of mixed ploidy individuals (with biallelic markers) in my data. Some are tetraploid and some are diploid. But I choose to run GATK HaplotypeCaller (to get genotype likelihoods) with -ploidy set to 4 for all organisms since I know the highest ploidy level in the data to be 4.

My idea is to run the data and obtain genotype likelihoods with the highest resolution and then downscale those values obtained to a lower ploidy level post-hoc.

For instance, given that there are 5 genotype classes/dosage levels for tetraploid organisms (0 of the reference allele, 1 of the reference, 2 of the reference, 3 of the reference and 4 of the reference), I will get 5 phred-scaled scores for each locus in each individual. Each score represents the probability of having a certain count for the reference allele (0 through 4).
Now if I deduce that one of these individuals is a diploid but I've already run the analyses:

  • Can I just combine the genotype likelihoods of the 3 heterozygote classes in the tetraploid call (1/3, 2/2, 3/1) to get the genotype likelihood of the one heterozygote class (1/1) in a diploid individual?
  • If so, how do I do this quantitatively?

For example, at a locus in an individual that I assumed to be tetraploid during the GATK run, I get these phred-scaled genotype likelihoods:
0/4 1/3 2/2 3/1 4/0
6     67    0    4    60

But I now know that this individual is diploid, so I am now looking for just 3 phred-scaled genotype likelihoods instead of 5:
0/2 1/1 2/0
?     ?     ?

Would I keep the homozygote classes the same i.e. 6 and 60 and then just average the 3 dosage classes for the heterozygote of the diploid? Or would I perform another similar mathematical operation?


Best Answer


  • vshastryvshastry Laramie, WyomingMember

    To be more clear, I would like to know how GATK's HaplotypeCaller would perform an operation similar to this? Would it pool the genotype likelihoods of the heterozygotes from a higher ploidy level into one class when I run it with a -ploidy set to two?

Sign In or Register to comment.