The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at

Aneuploidy samples

HideoHideo Member Posts: 10
edited July 2012 in Ask the GATK team

Leishmania has 36 chromosomes but their copy number is unpredictable for each strain and chromosome copy number can change very quickly. So what is an optimal ploidy setting for organisms with extensive aneuploidy? So far we use just diploid setting. Some samples have consistently more heterozygous SNPs in higher copy chromosomes but this relationship does not hold in many other samples: there is no strong correlation between chromosome copy number and abundance of heterozygous SNPs.

Post edited by Carneiro on


  • HideoHideo Member Posts: 10

    Or more specifically, can we change ploidy setting for each chromosome while detecting variations?

  • Mark_DePristoMark_DePristo Administrator, Dev Posts: 153 admin

    Guillermo may chime in but I believe you will have to call each chromosome separately with a different ploidy setting in ug. This would generalize to any intervals. If it were me I'd create intervals of haploid copy number, diploid, etc and then call these with the ug with -L and combine the resulting VCf. We need to make this more convenient in the future

    Mark A. DePristo, Ph.D.
    Co-Director, Medical and Population Genetics
    Broad Institute of MIT and Harvard

  • delangeldelangel Dev Posts: 71

    Indeed - the current use case for the -ploidy argument in UnifiedGenotyper is to assume a single ploidy throughout. As Mark said, you should call each chromosome (or interval, or set of chromosomes sharing same ploidy) separately using different -ploidy arguments.

  • HideoHideo Member Posts: 10

    Thank you for the replies. So in principle, population analysis of Leishmania, which has extensive aneuploidy, does not make sense since a chromosome can have very different copy number among a population of parasites.
    Also, our experiments suggest that ploidy is changing within several generations in Leishmania so it is difficult to come up with a proper model.

  • HideoHideo Member Posts: 10

    In a coming version, is it possible to GATK to automatically adjust ploidy value for each chromosome if a user provide the most abundant ploidy status? For reasonable samples, it is easy to determine ploidy value for a chromosome just from its median read depth. I do not think there are many organisms that suffer ubiquitous aneuploidy but if there are ones, then this would be good.
    [First check the depth, then assign ploidy value for each chromosome and then do analysis ...]
    But, biologically speaking, aneuploidy is so ubiquitous then SNPs are probably dominated by diploid/monosomy status since extra hetro SNPs will be washed away. I think that is the case for Leishmania.

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,117 admin

    Hi Hideo,

    That's an interesting feature idea. Right now we don't have the resources to make it a priority, but if you or someone else wants to implement it and send us a patch, we'd be happy to check it out and consider including it in a future release.

    Geraldine Van der Auwera, PhD

  • HideoHideo Member Posts: 10

    Since I have real data of over 200 samples with aneuploidy, I can possibly write it if I know the guide line and some one can tell me which "section of programs of GATK" to check for this to start out. But does it needs to be in java? I can use java but have not used it for long time. It seems a module can be really short. Get a median depth from a portion of genome that is longer than any possible indels (1Mb?) and then assign ploidy values for chromosome after properly normalising them. [Properly normalising means just dividing each chromosome depth by a median depth of all chromosomes. It is discussed in our paper ] It is very simple and it works most of the time. ]

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,117 admin
    edited August 2012

    I'm afraid it has to be in java, yes. See the new Developer Zone category, we have started migrating the existing developer documentation there. Hopefully it should be enough to get you started.

    As a caveat, some articles may need to be updated slightly, so if you have trouble finding something referenced in the articles, or some commands don't give the expected results, please post a comment on those articles and we will check/update as necessary.

    Good luck!

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.