Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Aneuploidy samples

HideoHideo Posts: 10Member
edited July 2012 in Ask the team

Leishmania has 36 chromosomes but their copy number is unpredictable for each strain and chromosome copy number can change very quickly. So what is an optimal ploidy setting for organisms with extensive aneuploidy? So far we use just diploid setting. Some samples have consistently more heterozygous SNPs in higher copy chromosomes but this relationship does not hold in many other samples: there is no strong correlation between chromosome copy number and abundance of heterozygous SNPs.

Post edited by Carneiro on

Answers

  • HideoHideo Posts: 10Member

    Or more specifically, can we change ploidy setting for each chromosome while detecting variations?

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GSA Member admin

    Guillermo may chime in but I believe you will have to call each chromosome separately with a different ploidy setting in ug. This would generalize to any intervals. If it were me I'd create intervals of haploid copy number, diploid, etc and then call these with the ug with -L and combine the resulting VCf. We need to make this more convenient in the future

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • delangeldelangel Posts: 71GSA Member mod

    Indeed - the current use case for the -ploidy argument in UnifiedGenotyper is to assume a single ploidy throughout. As Mark said, you should call each chromosome (or interval, or set of chromosomes sharing same ploidy) separately using different -ploidy arguments.

  • HideoHideo Posts: 10Member

    Thank you for the replies. So in principle, population analysis of Leishmania, which has extensive aneuploidy, does not make sense since a chromosome can have very different copy number among a population of parasites. Also, our experiments suggest that ploidy is changing within several generations in Leishmania so it is difficult to come up with a proper model.

  • HideoHideo Posts: 10Member

    In a coming version, is it possible to GATK to automatically adjust ploidy value for each chromosome if a user provide the most abundant ploidy status? For reasonable samples, it is easy to determine ploidy value for a chromosome just from its median read depth. I do not think there are many organisms that suffer ubiquitous aneuploidy but if there are ones, then this would be good. [First check the depth, then assign ploidy value for each chromosome and then do analysis ...] But, biologically speaking, aneuploidy is so ubiquitous then SNPs are probably dominated by diploid/monosomy status since extra hetro SNPs will be washed away. I think that is the case for Leishmania.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Hi Hideo,

    That's an interesting feature idea. Right now we don't have the resources to make it a priority, but if you or someone else wants to implement it and send us a patch, we'd be happy to check it out and consider including it in a future release.

    Geraldine Van der Auwera, PhD

  • HideoHideo Posts: 10Member

    Since I have real data of over 200 samples with aneuploidy, I can possibly write it if I know the guide line and some one can tell me which "section of programs of GATK" to check for this to start out. But does it needs to be in java? I can use java but have not used it for long time. It seems a module can be really short. Get a median depth from a portion of genome that is longer than any possible indels (1Mb?) and then assign ploidy values for chromosome after properly normalising them. [Properly normalising means just dividing each chromosome depth by a median depth of all chromosomes. It is discussed in our paper http://genome.cshlp.org/content/21/12/2143.long ] It is very simple and it works most of the time. ]

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin
    edited August 2012

    I'm afraid it has to be in java, yes. See the new Developer Zone category, we have started migrating the existing developer documentation there. Hopefully it should be enough to get you started.

    As a caveat, some articles may need to be updated slightly, so if you have trouble finding something referenced in the articles, or some commands don't give the expected results, please post a comment on those articles and we will check/update as necessary.

    Good luck!

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.