Dear GATK team,
Do you have a mathematical documentation for the algorithm to call multi-allelic variants, something similar to the samtools note http://samtools.github.io/bcftools/call-m.pdf ?
In case there may not be a documentation, could you tell me what the best way is to learn about the GATK method to call the multi-allelic variants?
I suspect these methods articles will help you:
Thanks so much. I have read the four links through but I am afraid that they cannot resolve my questions.
In https://www.broadinstitute.org/gatk/guide/article?id=4442, I tried two understandings of how you did with multi-allelic variants.
First, based on the haplotypecaller's algorithm documented in the link, you may choose the gentoypes with the greatest posterior probability for each individual and then combining all the individuals, if the total genotypes contain more than two alleles, you will output all the genotype likelihood for genotype combinations with the observed alleles. Is this true? I am afraid that this is not the right way. You may point me to the source code with the name of the script for me to validate this manipulation by myself. I have problems in finding where the relevant code is.
Second, it says "We use the approach described in [Li2011] to calculate the posterior probabilities of non-reference alleles (Methods 2.3.5 and 2.3.6) extended to handle multi-allelic variation."
I went through Li2011 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3198575/ and notice that in 2.1.13, the article actually have the bi-allelic assumptions. Could you please specify how you extend Li's algorithms to multi-allelic situation?
Hi @SiyangLiu, the documentation provided only covers the single sample case (where there is only one individual). If you are asking how the multi-sample case is handled, I believe you may find this new document informative.
It is difficult to point you to specific code at the moment because there are many components and it is organized in a way that is a bit hard to follow. Some of this code is currently being rewritten in a more transparent way. We hope that will make it easier for others to read and understand.