It looks like you're new here. If you want to get involved, click one of these buttons!

- 6.7K All Categories
- 195 Announcements
- 5.9K Ask the GATK team
- 217 GATK Documentation Guide
- 29 Tutorials
- 44 FAQs
- 12 Presentations
- 11 Common Problems
- 32 Methods and Algorithms
- 19 Dictionary
- 10 GATK 4 Alpha
- 1 Pipelining with Queue
- 30 Developer Zone
- 29 Archive
- 470 Tools related to GATK
- 132 MuTect v1
- 11 ReCapSeg
- 40 Oncotator
- 258 GenomeSTRiP
- 26 XHMM
- 3 Firepony Base Recalibrator

Powered by **Vanilla.**
Made with **Bootstrap.**

Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

marakat
Member Posts: **5** ✭

Hi,

I have three pools of hiseq data where N=20,20, & 40 individuals per pool. I sequenced to ~10x depth for each pool. I would like to use the ploidy setting to estimate probable genotypes from each pool, but I'm torn because it doesn't seem correct to estimate 20 or 40 genotypes from pools that have only been sequenced to 10x depth (only 10 chromosomes could have been sampled).

So in such a case, would it be more advisable to set the ploidy level to the depth level of 10 and estimate 5 genotypes per pool?

Thank you very much!

Tagged:

## Answers

71✭the fundamental problem is that you don't have enough coverage to reliably estimate allelic fractions correctly in your pools - if you have 10x and 20 individuals you only have 0.5x per individual and you'll have no power to detect low frequency variation in your pools. In theory you should set your ploidy to 20 or 40 so that at least you have mathematically accurate measurements of GQ and QUAL but you'll only get sensible results for common variants present in a large fraction of your individuals in the pool