Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

should ploidy setting of pooled sequence data depend on depth?

marakatmarakat Posts: 5Member


I have three pools of hiseq data where N=20,20, & 40 individuals per pool. I sequenced to ~10x depth for each pool. I would like to use the ploidy setting to estimate probable genotypes from each pool, but I'm torn because it doesn't seem correct to estimate 20 or 40 genotypes from pools that have only been sequenced to 10x depth (only 10 chromosomes could have been sampled).
So in such a case, would it be more advisable to set the ploidy level to the depth level of 10 and estimate 5 genotypes per pool?

Thank you very much!



  • delangeldelangel Posts: 71GATK Developer mod

    the fundamental problem is that you don't have enough coverage to reliably estimate allelic fractions correctly in your pools - if you have 10x and 20 individuals you only have 0.5x per individual and you'll have no power to detect low frequency variation in your pools. In theory you should set your ploidy to 20 or 40 so that at least you have mathematically accurate measurements of GQ and QUAL but you'll only get sensible results for common variants present in a large fraction of your individuals in the pool

Sign In or Register to comment.