It looks like you're new here. If you want to get involved, click one of these buttons!

- 5.8K All Categories
- 170 Announcements
- 5K Ask the GATK team
- 424 GATK Documentation Guide
- 47 FAQs
- 10 Common Problems
- 26 Tutorials
- 12 Presentations
- 30 Methods and Algorithms
- 13 Dictionary
- 0 Pipelining with Queue
- 28 Developer Zone
- 26 Archive
- 415 Tools related to GATK
- 117 MuTect v1
- 10 ReCapSeg
- 33 Oncotator
- 225 GenomeSTRiP
- 27 XHMM
- 3 Firepony Base Recalibrator

Powered by **Vanilla.**
Made with **Bootstrap.**

marakat
Posts: **5**Member ✭

Hi,

I have three pools of hiseq data where N=20,20, & 40 individuals per pool. I sequenced to ~10x depth for each pool. I would like to use the ploidy setting to estimate probable genotypes from each pool, but I'm torn because it doesn't seem correct to estimate 20 or 40 genotypes from pools that have only been sequenced to 10x depth (only 10 chromosomes could have been sampled).

So in such a case, would it be more advisable to set the ploidy level to the depth level of 10 and estimate 5 genotypes per pool?

Thank you very much!

Tagged:

## Answers

71Dev modthe fundamental problem is that you don't have enough coverage to reliably estimate allelic fractions correctly in your pools - if you have 10x and 20 individuals you only have 0.5x per individual and you'll have no power to detect low frequency variation in your pools. In theory you should set your ploidy to 20 or 40 so that at least you have mathematically accurate measurements of GQ and QUAL but you'll only get sensible results for common variants present in a large fraction of your individuals in the pool