Single-Cell population join calling
I have a question regarding the pipeline I should use to do some QC on my different samples of single-cell sequencing I have.
Here's mainly the design of our experiment. We have different individuals for which sperm data was sequenced for single cell. So, for each individual, I have 192 different WGS samples that were sequenced on the same flowcell. I have 20 of those individuals. So, I will have ~1X coverage for every sample. (haploid)
My first question is, will GATK be able to handle 20x192 samples for genotyping purpose? Will GATK be able to deal with that kind of coverage or will it gives me weird results?
Then, I want to do some QC with genotyping data we already have from a genotyping array. So I want to create bulk sample from those 192 samples. I already tested multiple pipelines, but I'm not sure what will be the best way to proceed. Here's the different pipelines I tried :
1) Process each sample separately with HC in GVCF mode -> change header of the gvcf to have the same sample name for every sample -> do the GenotypeGVCFs step on all those sample to get a vcf with one sample (forgot to change the ploidy parameter...)
2) Change the RG tag in the bam file -> Do HC in GVCF mode on all the bams at the same time -> do GenotypeGVCFs on that resulting g.vcf file.
3) Did not do it for now Do same as 1, but changing the ploidy to 1
I'm wondering thought concerning that ploidy parameter. Here's what's written in GATK HC doc page
Ploidy (number of chromosomes) per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).
What will it be in my case, should I only consider 2, as I'm working with 192 sperm cells coming from diploid organism?
Thanks a lot for your help.