The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ` ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Aneuploidy samples

Member Posts: 10
edited July 2012

Leishmania has 36 chromosomes but their copy number is unpredictable for each strain and chromosome copy number can change very quickly. So what is an optimal ploidy setting for organisms with extensive aneuploidy? So far we use just diploid setting. Some samples have consistently more heterozygous SNPs in higher copy chromosomes but this relationship does not hold in many other samples: there is no strong correlation between chromosome copy number and abundance of heterozygous SNPs.

Post edited by Carneiro on

• Member Posts: 10

Or more specifically, can we change ploidy setting for each chromosome while detecting variations?

Guillermo may chime in but I believe you will have to call each chromosome separately with a different ploidy setting in ug. This would generalize to any intervals. If it were me I'd create intervals of haploid copy number, diploid, etc and then call these with the ug with -L and combine the resulting VCf. We need to make this more convenient in the future

--
Mark A. DePristo, Ph.D.
Co-Director, Medical and Population Genetics
Broad Institute of MIT and Harvard

Indeed - the current use case for the -ploidy argument in UnifiedGenotyper is to assume a single ploidy throughout. As Mark said, you should call each chromosome (or interval, or set of chromosomes sharing same ploidy) separately using different -ploidy arguments.

• Member Posts: 10

Thank you for the replies. So in principle, population analysis of Leishmania, which has extensive aneuploidy, does not make sense since a chromosome can have very different copy number among a population of parasites.
Also, our experiments suggest that ploidy is changing within several generations in Leishmania so it is difficult to come up with a proper model.

• Member Posts: 10

In a coming version, is it possible to GATK to automatically adjust ploidy value for each chromosome if a user provide the most abundant ploidy status? For reasonable samples, it is easy to determine ploidy value for a chromosome just from its median read depth. I do not think there are many organisms that suffer ubiquitous aneuploidy but if there are ones, then this would be good.
[First check the depth, then assign ploidy value for each chromosome and then do analysis ...]
But, biologically speaking, aneuploidy is so ubiquitous then SNPs are probably dominated by diploid/monosomy status since extra hetro SNPs will be washed away. I think that is the case for Leishmania.

Hi Hideo,

That's an interesting feature idea. Right now we don't have the resources to make it a priority, but if you or someone else wants to implement it and send us a patch, we'd be happy to check it out and consider including it in a future release.

Geraldine Van der Auwera, PhD

• Member Posts: 10

Since I have real data of over 200 samples with aneuploidy, I can possibly write it if I know the guide line and some one can tell me which "section of programs of GATK" to check for this to start out. But does it needs to be in java? I can use java but have not used it for long time. It seems a module can be really short. Get a median depth from a portion of genome that is longer than any possible indels (1Mb?) and then assign ploidy values for chromosome after properly normalising them. [Properly normalising means just dividing each chromosome depth by a median depth of all chromosomes. It is discussed in our paper http://genome.cshlp.org/content/21/12/2143.long ] It is very simple and it works most of the time. ]