Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Step order for PhaseBytransmission and ReadBackedPhasing

Erika_GedvilaiteErika_Gedvilaite New JerseyMember
edited January 2015 in Ask the GATK team

Hello GATK team,

Our lab is working on project involving exome sequencing for family trios and we were interested in determining the parent of origin for the trios. In one of the papers that we have come across, we found that the project team used ReadBackedPhasing first and then applied PhasedByTransmission.

I have read previous post on GATK forum and looked at the presentations which are provided by GATK team and found that the analysis is suggested to be done in a way where PhasedByTransmission step is done before ReadBackedPhasing. We are new to these tools, so if you could shed any light on how the tools work when combined, we would really appreciate it.

The step combinations which we have already have tried out are:

a)
1) SelectVariants 2)ReadBackedPhasing 3)PhasedByTransmission

b)
1) SelectVariants 2)PhasedByTransmission 3)ReadBackedPhasing

Post edited by Geraldine_VdAuwera on

Best Answers

Answers

  • Erika_GedvilaiteErika_Gedvilaite New JerseyMember

    Thank you Geraldine for taking time and answering my question. We previously used GATK version 3.0, but will now try rerunning our data with the newest GATK version.

    We also have a some questions about the output given by PhasedByTransmission and ReadBasedPhasing. We cannot find a detailed output description for the tools and we were a bit confused with the genotype changing. I am attaching the 3 outputs - 1)VCF from SelectVariants, 2)VCF from PhasedBYTransmission and 3)VCF from ReadBasedPhasing for one of the variants that was phased and had output:

    1) chr3 58394556 . C G GT:AD:DP:GQ:PL 0/0:24,0:24:66:0,66,850 0/0:28,0:28:60:0,60,900 0/1:44,10:54:99:164,0,1372

    2) chr3 58394556 . C G GT:AD:DP:GQ:PL:TP 0|0:24,0:24:66:0,66,850:6 1|0:28,0:28:0:0,60,900:6 1|0:44,10:54:99:164,0,1372:6

    3) chr3 58394556 . C G GT:AD:DP:GQ:HP:PL:TP 0|0:24,0:24:66:.:0,66,850:6 1|0:28,0:28:0:58394556-1,58394556-2:0,60,900:6 1|0:44,10:54:99:58394556-1,58394556-2:164,0,1372:6

    Could you explain how the genotype changes from step 1 to 3 in the mother(2nd genotype) and how we should interpret the result.

    Also, how do we interpret 58394556-1,58394556-2 result(HP flag)?

    If there is a documentation that explains this more extensively, we would really appreciate it if you could point us to it.

    Thank you again for all of the help and giving us your time answering these questions.

    Erika

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Erika,

    The phasing tools are developed by external collaborators, @mfromer and @Laurent -- hopefully they can have a look and answer your questions, because I'm not sure, to be honest. I can tell you that the HP flag contains identifiers for the haplotypes on which the variants are located.

  • Erika_GedvilaiteErika_Gedvilaite New JerseyMember

    Hello Geraldine and Laurent,

    Thank you very much for taking time out of your busy schedules to answer our questions. The explanation about the HP tag that Laurent provided really helped us. Geraldine, we also reran our data in both GATK 3.0 and GATK 3.3 and both of the results are identical, so the versions 3.0 and 3.3 do not show any changes in these tool outputs.

    Our next question would be : How do you use the information give by the ReadbackedPhasing outputs to infer which of the parents is the parent of origin for the de novo mutation? How do we infer that the haplotype in the parent is passed down to the affected kid? Also how should we used the surrounding haplotype information to better explain our findings.

    1) chr3 58394556 . C G GT:AD:DP:GQ:PL 0/0:24,0:24:66:0,66,850 0/0:28,0:28:60:0,60,900 0/1:44,10:54:99:164,0,1372

    2) chr3 58394556 . C G GT:AD:DP:GQ:PL:TP 0|0:24,0:24:66:0,66,850:6 1|0:28,0:28:0:0,60,900:6 1|0:44,10:54:99:164,0,1372:6

    3) chr3 58394556 . C G GT:AD:DP:GQ:HP:PL:TP 0|0:24,0:24:66:.:0,66,850:6 1|0:28,0:28:0:58394556-1,58394556-2:0,60,900:6 1|0:44,10:54:99:58394556-1,58394556-2:164,0,1372:6

    Would the example above say that the

    Thank you very much again for all of the advice and information.

    Erika

  • LaurentLaurent Member, Broadie ✭✭

    Hi Erika,

    At the moment there is no tool within the GATK to infer the parent of origin for de novo mutations. Basically what needs to be done is to go through the entries and "follow" each haplotype using the HP tags. The sum up the evidence for each parental haplotype.
    I'm attaching a perl script that we use to do just this. Use at your own risk :) (and let me know if you encounter any problems).

  • Erika_GedvilaiteErika_Gedvilaite New JerseyMember

    Dear Laurent,

    Thank you very much for answering our questions. We will look into pearl script and hopefully we can get some results out of this.

    Thank you again for taking time out of your busy day to answer our questions and guide us through this process.

    Erika

  • holtgreweholtgrewe BerlinMember

    Thanks for all of the information so far. I have one more question.

    @Laurent In your script. You are only considering the HP tag in the child. Is the HP name only meaningful within one sample?

    Other said, when seeing 58394556-1 in the same, unphased site or a different sites for both the child and the father, is this carrying any information regarding whether the haplotype is of parental origin?

Sign In or Register to comment.