Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How does PhaseByTransmission define the phasing of Father and Mother?

Hi, I have genotyped a trio, and then phased them using PhaseByTransmission of GATK Tools. I have then parsed the output vcf file using BCFTools query command. The output file looks like as follows:


20 65288 G/T ./. ./.
20 65900 A|A A|G A|A
20 66720 ./. C/A C/A
20 68749 T|C C|C C|T
20 69094 G/A ./. ./.

The phasing information can be interpreted as Mother|Father. Now, I understand that the Child Phasing can be done by applying Mendelian Laws. But How the phasing of Father or Mother is done here? For example here "20 68749 T|C C|C C|T", every person of the trio is phased; even the father (het) without even knowing grand parents of the child. As I am a new researcher in this field, your clarification will help me a lot to understand phasing.

Thank you very much,


  • SheilaSheila Broad InstituteMember, Broadie admin

    Hi Ruhul,

    Can you tell me a little more about what you did? How did you get the original unphased VCF? Did you follow GATK's Best Practices? What was the exact command you ran for PhaseByTransmission? Can you post some of the VCF records from the PhaseByTransmission output?


  • ruhulsburuhulsbu NYMember

    Hi Sheila (@Sheila),

    Thank you very much for your reply. I have actually followed the GATK Best Practice Tutorial of "ASHG 2015 GATK Tutorial". I used the dataset that were used in this tutorial (https://www.broadinstitute.org/gatk/blog?id=6162). The commands are exactly the same found in the tutorial.

    I also want to know the best way to phase a family trio. I have learnt the following pipeline to do this:
    1. Align genome sequences of family members and generate bam files
    2. Haplotype the family members and generate vcf files
    3. Phase the family members using vcf and pedigree file
    4. Read back phase the sequences

    But I did not see any significant differences between ReadBackedPhasing and PhaseByTransmission output except the HP tags in read backed phasing file. But, I thought the read backed phasing will show more haplotypes including the one done by PhaseByTransmission. Please suggest if my order of computation is correct, and the best way to get the true phasing sequences for a child given father and mother. Also, please include your opinion about the phasing of father and mother from these steps. Do we get real phasing output for them, or is their anyway to predict the phasing of them without knowing the grand parent? Thank you again for all your help.

    With Kind Regards,

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The pedigree-based phasing produced for the parents is entirely relative to the child. This does not relate to the "real" physical phasing of variants relative to each other on haplotypes. You get physical phasing from HaplotypeCaller if you are using the latest version in the GVCF mode, so you don't need to use ReadBackedPhasing unless you want to merge MNPs.

Sign In or Register to comment.