Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

missing physical phasing information in vcf?

bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
edited April 19 in Solutions to Problems

The way the phasing algorithm decides to phase is by checking whether two variants always occur on the same haplotype or always occur on a different haplotypes. The excess haplotypes severely dilute the signal.

For example, let's say variants A and B both occur on real haplotype H1, but that HC also assembled a similar false haplotype H2. If any reads supporting variant A match H2 better than H1, the phasing via H1 is lost.

This raises the question of whether we could do better, and the answer is yes, easily. The current code is very naive.
However, instead of improving our phasing algorithm our current efforts are in assembling fewer and better haplotypes.

Basically, the goal is to prevent H2 from existing in the first place, in which case the current naive phasing algorithm will probably work well enough.

Sign In or Register to comment.