The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

Haplotype Score and Phasing

stechenstechen University of PennsylvaniaMember Posts: 23

Hello! I was wondering if the HaplotypeScore annotation was restored for HaplotypeCaller in GATK 2.6. Does it have to be called? (It's not included in my vcf file.) Moreover, all of the GT field designations have "/" instead of "|" which according to the following would mean that the results are still unphased:

"GT genotype, encoded as alleles values separated by either of ”/” or “|”, e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1 or 1|0 etc. For haploid calls, e.g. on Y, male X, mitochondrion, only one allele value should be given. All samples must have GT call information; if a call cannot be made for a sample at a given locus, ”.” must be specified for each missing allele in the GT field (for example ./. for a diploid). The meanings of the separators are:
/ : genotype unphased
| : genotype phased" Call Format/vcf-variant-call-format-version-40

Also, is there a more detailed explanation than what's on the HaplotypeScore documentation page? How is the score determined in UnifiedGenotyper? Does the score have anything to do with phasing? Also, how is phasing achieved if only the 10bps surrounding the SNP are examined, regions which likely do not include other SNPs?

Thank you!

Best Answer


  • jzookjzook Member Posts: 17

    Hi Geraldine,

    Since complex variants (i.e., nearby SNPs and indels) represent a significant fraction of variants, and unphased complex variants are not very useful, it would be really great if the HaplotypeCaller would output phasing for complex variants. The Haplotypecaller inherently should know the phasing of complex variants based on how the algorithm works, so it seems like this should be a pretty straightforward thing to do. Awhile ago, I thought an older version of HaplotypeCaller actually could output phased haplotypes. Do you have plans to add this ability in the near future?


  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 10,995 admin

    Hi Justin,

    Unfortunately, phasing complex variants is more complicated than it might seem, and it makes the evaluation of the callsets even more difficult, so we have no immediate plans to implement this. But it's interesting to hear that this is something people would want...

    For now, you can add -mergeVariantsViaLD to your HC command line to go back to the old behavior of merging together nearby events if there is enough evidence in the population to support it. But please be aware that in general we don't support that option.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.