Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What is the best way to find denovo mutations in trios?

I have around 100 trios for which WES was done.
My goal is to find denovo mutations in the child associated with each trios.
So first I will do the following steps:
1 -Alignment to reference genome
2 - marking duplicates
3 - base recalibration
4 - realigning indels
5 - Haplotype caller per sample with the -ERC GVCF option (this will call the ReadBackedPhasing, correct?)
6 - Joint genotyping
7 - Varinat recalibration
8 - Genotype refinement workflow, where pedegree information is used and de novos are annotated using VariantAnnotator.

1- Do you think thins workflow is efficient and best to find denovos ?
2- Are the variants in the output vcf file produced after step 8 already phased ? (because ReadBackedPhasing was already used in step 5)
3- Do I need to use PhaseByTransmission afterwards after step 8 ?

Many thanks

Best Answer

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    I would use SelectVariants with JEXL statements to my liking. vc.getGenotype(samplename)........

    JEXL is fun and strong. Also. If you have java skills you may code your own VariantSelector using HTSJDK which I (do nowadays) find even more fun (literally there is no limit of what you can do with it ! )

  • Yes that seems like fun.. and probably would a good way to use. I have expertise in Java, the problem is that I do not know what filters/thresholds to use in my own VariantSelector to be able to get the high confidence denovo mutations, as I am still new to this. That is why I am trying to use the existing tools, for now.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Before going into any threshold filters I would go simply for transmission and genotyping data from trios. whether a particular variant is called or not. if called is it homozygous or heterozygous ref or alt and whether it is found in father in mother in child. After that I go deep into thresholds.

  • Thank you so much for you helpful answer!
    One more related question.
    Since Haplotypecaller does physical phasing and also in the Genotype Refinement workflow, possible de novo mutations are annotated, could you please let me now what would using PhaseByTransmission afterwards adds as a value to the whole process? since as far as I can see, de novo mutations are already detected .
    I am asking, because in the paper entitled: A framework for the detection of de novo mutations in family-based sequencing data, the following is mentioned:
    'We developed phasebytransmission to identify de novo single nucleotide variants and short insertions and deletions ... etc'
    But as I have mentioned, the de novo mutations are already detected, so what is the important role of PhaseByTransmission in detecting de novos ?

  • Maybe PhaseByTransmission would help in phasing if one of the parents is missing? (in case of Duos)?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @lait
    Hi,

    PhaseByTransmission is not necessary, and can be redundant if you are using the Genotype Refinement workflow. The only thing it can help with is re-phasing after the genotypes and posteriors have been updated.

    -Sheila

  • Thank you!
    I have got external trio vcf files which were produced by varscan in trio mode. I need to do the phasing to those files. So in this case what do you suggest? to use readbackedphasing followed by phasebytransmission or just readbackedphasing (again, the data was processed by varscan, so NO Genotype Refinement workflow has been applied)

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @lait
    Hi,

    I guess it depends on what type of phasing you are interested in. I think you can use ReadBackedPhasing only for physical phasing. PhaseByTransmission will take into account the family relationships when phasing.

    -Sheila

Sign In or Register to comment.