Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Protocol for ploidy data

Hi
I have 3 allo-tetraploid genome( like cotton) fro snps analysis
while calling snps ploidy will be set to 2 or 4?

Best Answer

  • shleeshlee Cambridge ✭✭✭✭✭
    edited January 2018 Accepted Answer

    Hi @Kritika,

    That is very cool you work on an allotetraploid organism.

    [1] If the reference genome you use represents the contigs from the A and D subspecies separately, then you should set your ploidy to 2. Here we assume you will sequence your three samples similarly to how the reference was assembled--by using allohaploids.

    [2] If the reference genome you are using represents the contigs from the A and D subspecies as merged contigs, then you should set your ploidy to 4.

    [3] If the reference genome you are using is [1] but your samples are like that of [2], then hopefully you have aligned your reads perhaps with alt-aware alignment and post-alt processing (see links on GRCh38 I give in this post). I can imagine that the two A and D subspecies could be rather close in terms of evolution (to be able to make such an allotetraploid organism) and that certain regions of the two genomes (although unique within the subspecies) may be identical between the two subspecies. So you will have to account for such regions that generate secondary alignments.

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited January 2018 Accepted Answer

    Hi @Kritika,

    That is very cool you work on an allotetraploid organism.

    [1] If the reference genome you use represents the contigs from the A and D subspecies separately, then you should set your ploidy to 2. Here we assume you will sequence your three samples similarly to how the reference was assembled--by using allohaploids.

    [2] If the reference genome you are using represents the contigs from the A and D subspecies as merged contigs, then you should set your ploidy to 4.

    [3] If the reference genome you are using is [1] but your samples are like that of [2], then hopefully you have aligned your reads perhaps with alt-aware alignment and post-alt processing (see links on GRCh38 I give in this post). I can imagine that the two A and D subspecies could be rather close in terms of evolution (to be able to make such an allotetraploid organism) and that certain regions of the two genomes (although unique within the subspecies) may be identical between the two subspecies. So you will have to account for such regions that generate secondary alignments.

  • KritikaKritika IndiaMember

    Hi @Shlee
    My sample is cotton gossypium hirsutum which i believe is merged reference from two sub species A and D
    So i put ploidy as 4?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @Kritika,

    My sample is cotton gossypium hirsutum which i believe is merged reference from two sub species A and D
    So i put ploidy as 4?

    If all the merged contigs overlap 100%, then setting ploidy to 4 makes sense. Otherwise, you could use an intervals list to subset regions that were merged (set ploidy to 4) and regions that remain unmerged (set ploidy to 2).

  • KritikaKritika IndiaMember

    Hi @shlee I did not understood this steps
    "If all the merged contigs overlap 100%, then setting ploidy to 4 makes sense. Otherwise, you could use an intervals list to subset regions that were merged (set ploidy to 4) and regions that remain unmerged (set ploidy to 2)."

    See as of now i got reference from NCBI which i beleive has A and D combination(A from AA species and D from DD species This combination is making A and D in my current reference ).
    The literature is showing chromosome number for Gossypium hirsutum as AADD, 2n = 4x = 52
    This is have used for mapping

    Now how i will try this step which you are saying what does the mergeing 100% means?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    edited January 2018

    @Kritika
    Hi,

    See as of now i got reference from NCBI which i beleive has A and D combination(A from AA species and D from DD species This combination is making A and D in my current reference ). The literature is showing chromosome number for Gossypium hirsutum as AADD, 2n = 4x = 52 This is have used for mapping

    If some of the contigs came only from subspecies A or subspecies D, you would have to consider setting ploidy to 2 for those contigs only. However, in your case, you should be fine setting ploidy to 4 for the entire genome, because it seems like all the contigs are merged from both subspecies.

    I hope this helps.

    -Sheila

Sign In or Register to comment.