Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Aneuploidy samples

HideoHideo Member
edited July 2012 in Ask the GATK team

Leishmania has 36 chromosomes but their copy number is unpredictable for each strain and chromosome copy number can change very quickly. So what is an optimal ploidy setting for organisms with extensive aneuploidy? So far we use just diploid setting. Some samples have consistently more heterozygous SNPs in higher copy chromosomes but this relationship does not hold in many other samples: there is no strong correlation between chromosome copy number and abundance of heterozygous SNPs.

Post edited by Carneiro on

Answers

  • HideoHideo Member

    Or more specifically, can we change ploidy setting for each chromosome while detecting variations?

  • Mark_DePristoMark_DePristo admin Broad InstituteMember admin

    Guillermo may chime in but I believe you will have to call each chromosome separately with a different ploidy setting in ug. This would generalize to any intervals. If it were me I'd create intervals of haploid copy number, diploid, etc and then call these with the ug with -L and combine the resulting VCf. We need to make this more convenient in the future

  • delangeldelangel ✭✭ Broad InstituteMember ✭✭

    Indeed - the current use case for the -ploidy argument in UnifiedGenotyper is to assume a single ploidy throughout. As Mark said, you should call each chromosome (or interval, or set of chromosomes sharing same ploidy) separately using different -ploidy arguments.

  • HideoHideo Member

    Thank you for the replies. So in principle, population analysis of Leishmania, which has extensive aneuploidy, does not make sense since a chromosome can have very different copy number among a population of parasites.
    Also, our experiments suggest that ploidy is changing within several generations in Leishmania so it is difficult to come up with a proper model.

  • HideoHideo Member

    In a coming version, is it possible to GATK to automatically adjust ploidy value for each chromosome if a user provide the most abundant ploidy status? For reasonable samples, it is easy to determine ploidy value for a chromosome just from its median read depth. I do not think there are many organisms that suffer ubiquitous aneuploidy but if there are ones, then this would be good.
    [First check the depth, then assign ploidy value for each chromosome and then do analysis ...]
    But, biologically speaking, aneuploidy is so ubiquitous then SNPs are probably dominated by diploid/monosomy status since extra hetro SNPs will be washed away. I think that is the case for Leishmania.

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin

    Hi Hideo,

    That's an interesting feature idea. Right now we don't have the resources to make it a priority, but if you or someone else wants to implement it and send us a patch, we'd be happy to check it out and consider including it in a future release.

  • HideoHideo Member

    Since I have real data of over 200 samples with aneuploidy, I can possibly write it if I know the guide line and some one can tell me which "section of programs of GATK" to check for this to start out. But does it needs to be in java? I can use java but have not used it for long time. It seems a module can be really short. Get a median depth from a portion of genome that is longer than any possible indels (1Mb?) and then assign ploidy values for chromosome after properly normalising them. [Properly normalising means just dividing each chromosome depth by a median depth of all chromosomes. It is discussed in our paper http://genome.cshlp.org/content/21/12/2143.long ] It is very simple and it works most of the time. ]

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin
    edited August 2012

    I'm afraid it has to be in java, yes. See the new Developer Zone category, we have started migrating the existing developer documentation there. Hopefully it should be enough to get you started.

    As a caveat, some articles may need to be updated slightly, so if you have trouble finding something referenced in the articles, or some commands don't give the expected results, please post a comment on those articles and we will check/update as necessary.

    Good luck!

Sign In or Register to comment.