Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
CalculateGenotypePosteriors - do i need to run this on each trio individually?
I have a query regarding the “CalculateGenotypePosteriors” algorithm in GATK and would really appreciate insight from one of the team members.
I have 10 trios and I want to identify denovo mutations.
When i run CalculateGenotypePosteriors , is their any difference in how the genotype posterior will be calculated if I run the CalculateGenotypePosteriors on each trio individually or can I submit them altogether with 1 pedfile defining each of the 10 unique families.
In addition, in my VCF file I have empirical AF that was calculated across a larger set of 1900 individuals, my 10 families (30 individuals) are a subset of these.
When the CalculateGenotypePosterior and the subsequent VariantAnnotation for PossibleDeNovo variants is run, is the AC and AF determined from what is explicitly stated in the VCF file (i.e. the AF calculated from the 1900 individuals) or does it re estimate the AC or AF based on the individuals supplied in the VCF file.
Finally, if I am in interested in de novo mutations in 10 trios is it fine to provide VCF with 1900 individuals as long as the corresponding ped file identifies which individuals are part of the family and which are not. Should this have an impact on how the CalculateGenotypePosterior is performed?
Thanks in advance for your guidance. I was discussing the above with one of my colleagues and we unclear on whether the different approaches would impact the results.