If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

should ploidy setting of pooled sequence data depend on depth?


I have three pools of hiseq data where N=20,20, & 40 individuals per pool. I sequenced to ~10x depth for each pool. I would like to use the ploidy setting to estimate probable genotypes from each pool, but I'm torn because it doesn't seem correct to estimate 20 or 40 genotypes from pools that have only been sequenced to 10x depth (only 10 chromosomes could have been sampled).
So in such a case, would it be more advisable to set the ploidy level to the depth level of 10 and estimate 5 genotypes per pool?

Thank you very much!



  • delangeldelangel Broad InstituteMember ✭✭

    the fundamental problem is that you don't have enough coverage to reliably estimate allelic fractions correctly in your pools - if you have 10x and 20 individuals you only have 0.5x per individual and you'll have no power to detect low frequency variation in your pools. In theory you should set your ploidy to 20 or 40 so that at least you have mathematically accurate measurements of GQ and QUAL but you'll only get sensible results for common variants present in a large fraction of your individuals in the pool

Sign In or Register to comment.