PhaseByTransmission using reads?

npontikosnpontikos University College LondonMember
edited January 2016 in Ask the GATK team

I would expect the trio phasing to try and match whole reads from the parents into the child to infer whether a variant came from mum or dad (so to work on the BAM file or at least the output of HaplotypeCaller).
But it seem that the input is a VCF file?


  • KlausNZKlausNZ Member ✭✭

    The PBT algorithm does not operate as you expect, it is well described in the tool docs. Also, unless you produce a HC -bamOut file, your mode will means that PBT will operate on the pre-HC bam, which may create conflicts with calls in your HC vcf that were made at a locus where HC de novo assembly changed the alignments.
    That makes me wonder how RBP handles these conflicts ...

    On the upside, the output of HaplotypeCaller is a (g)vcf file, so one of your expectation is met ;-)

    Would combining PhaseByTransmission with ReadBackedPhasing achieve your aim? We usually phase trios in that order, initially leveraging high-quality genotype calls and pedigree information, then phase the (typically few) unphased loci by-read (and per sample therefore pedigree agnostic) where possible (given loci distance and read length).

    You may have valid reasons to apply both tools in inverse order, of course.

  • npontikosnpontikos University College LondonMember

    ok so you are suggesting first doing trio phasing then doing RBP phasing of the remaining unphased SNPs ?

    Issue · Github
    by Sheila

    Issue Number
    Last Updated
    Closed By
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin


    Sorry for the late response. I am having someone from the team review this, and we will get back to you soon.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    First, keep in mind that phasing by transmission as done by PBT and physical phasing as done by RBP and HC produce qualitatively different information.

    For strictly physical phasing, the output you get from the HaplotypeCaller + GenotypeGVCFs should be sufficient, unless you require MNPs to be merged (which HC doesn't do, only RBP will produce that). The physical phasing produced by HC+GGVCFs is admittedly limited to the span of each ActiveRegion, but that usually covers the distance within which you can reliably phase variants anyway (beyond that everything is reference and it's not possible to extend the haplotype resolution unless you have stupendously long reads).

    Then for trio-based evaluation and annotation of possible Mendelian Violations, we apply the Genotype Refinement workflow as described in the Best Practices documentation. We don't usually apply PhaseByTransmission anymore in our pipelines but you could add that on as an additional processing step.

Sign In or Register to comment.