If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

Single-Cell population join calling

JCGrenierJCGrenier ✭✭Montreal, QCMember ✭✭

Hello folks,

I have a question regarding the pipeline I should use to do some QC on my different samples of single-cell sequencing I have.
Here's mainly the design of our experiment. We have different individuals for which sperm data was sequenced for single cell. So, for each individual, I have 192 different WGS samples that were sequenced on the same flowcell. I have 20 of those individuals. So, I will have ~1X coverage for every sample. (haploid)

My first question is, will GATK be able to handle 20x192 samples for genotyping purpose? Will GATK be able to deal with that kind of coverage or will it gives me weird results?

Then, I want to do some QC with genotyping data we already have from a genotyping array. So I want to create bulk sample from those 192 samples. I already tested multiple pipelines, but I'm not sure what will be the best way to proceed. Here's the different pipelines I tried :

1) Process each sample separately with HC in GVCF mode -> change header of the gvcf to have the same sample name for every sample -> do the GenotypeGVCFs step on all those sample to get a vcf with one sample (forgot to change the ploidy parameter...)

2) Change the RG tag in the bam file -> Do HC in GVCF mode on all the bams at the same time -> do GenotypeGVCFs on that resulting g.vcf file.

3) Did not do it for now Do same as 1, but changing the ploidy to 1

I'm wondering thought concerning that ploidy parameter. Here's what's written in GATK HC doc page

Ploidy (number of chromosomes) per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).

What will it be in my case, should I only consider 2, as I'm working with 192 sperm cells coming from diploid organism?

Thanks a lot for your help.


Sign In or Register to comment.