Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Haplotype Score and Phasing

stechenstechen University of PennsylvaniaPosts: 23Member

Hello! I was wondering if the HaplotypeScore annotation was restored for HaplotypeCaller in GATK 2.6. Does it have to be called? (It's not included in my vcf file.) Moreover, all of the GT field designations have "/" instead of "|" which according to the following would mean that the results are still unphased:

"GT genotype, encoded as alleles values separated by either of ”/” or “|”, e.g. The allele values are 0 for the reference allele (what is in the reference sequence), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1 or 1|0 etc. For haploid calls, e.g. on Y, male X, mitochondrion, only one allele value should be given. All samples must have GT call information; if a call cannot be made for a sample at a given locus, ”.” must be specified for each missing allele in the GT field (for example ./. for a diploid). The meanings of the separators are: / : genotype unphased | : genotype phased" http://www.1000genomes.org/wiki/Analysis/Variant Call Format/vcf-variant-call-format-version-40

Also, is there a more detailed explanation than what's on the HaplotypeScore documentation page? How is the score determined in UnifiedGenotyper? Does the score have anything to do with phasing? Also, how is phasing achieved if only the 10bps surrounding the SNP are examined, regions which likely do not include other SNPs?

Thank you!

Best Answer

Answers

  • jzookjzook Posts: 17Member

    Hi Geraldine,

    Since complex variants (i.e., nearby SNPs and indels) represent a significant fraction of variants, and unphased complex variants are not very useful, it would be really great if the HaplotypeCaller would output phasing for complex variants. The Haplotypecaller inherently should know the phasing of complex variants based on how the algorithm works, so it seems like this should be a pretty straightforward thing to do. Awhile ago, I thought an older version of HaplotypeCaller actually could output phased haplotypes. Do you have plans to add this ability in the near future?

    Thanks! Justin

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,213Administrator, GSA Member admin

    Hi Justin,

    Unfortunately, phasing complex variants is more complicated than it might seem, and it makes the evaluation of the callsets even more difficult, so we have no immediate plans to implement this. But it's interesting to hear that this is something people would want...

    For now, you can add -mergeVariantsViaLD to your HC command line to go back to the old behavior of merging together nearby events if there is enough evidence in the population to support it. But please be aware that in general we don't support that option.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.