Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Unexpected genotype likelihoods & genotype qualities
I noted that some of my genotype qualities are equal to 0. I interpret this as if two genotypes were equally likely. However, sometimes this interpretation does not look consistent with the primary data. For instance, I am confused with the example below:
Both genotypes are homozygous REFs.
The first one is supported by 5 out of 5 reads. It scores genotype quality 15. This looks reasonable: there is some probability of heterozygous genotype because of the low coverage.
The second genotype is supported by 21 of 21 reads. However, it suddenly yields the genotype score of … zero ? How could it be that the heterozygous is as likely as homozygous given such evidence?
Investigating this example further, I noted that there was a significant difference in ALT allele frequency between these two variants in the whole studied dataset (~500 samples). The first variant has quite low ALT frequency (<1%). The second variant has quite frequent ALT allele ( ~50%). Can this explain the unexpected zero genotype quality calculated for the second genotype?
What factors are taken into account when calculating the genotype likelihoods?
Is this example an expected output?
Could it be an error?
Thank you in advance,