Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Regarding ploidy in Haplotyple Caller for multiple replicates of pooled RNAseq

cjaln1994cjaln1994 MelbourneMember

Hi,
I am a little confused about the best practices for running Haplotyple Caller to call variants given the pooled nature of my study, any feedback is super appreciated!

I have 10 replicates of pooled, RNAseq data each for two samples (10 replicates for Sample A, 10 replicates for Sample B ). By pooled I mean each replicate has mRNA from 20 individuals all mixed together with no barcoding (population genetics study).

I had planned to just merge the bam files of these replicates, who have RGSMs of SampleA and SampleB, and simply run Haplotype Caller for Sample A and Sample B. However, that would mean I would set ploidy = 2 x 200. This seems very high!
Would it be better to run Haplotype Caller for each replicate separately, without merging the bam files and setting ploidy = 2 x 20, And then use some kind of tool such as CombineVariants to stack my vcf files into two samples for downstream comparisons?
Any advice?
Regards!
Chris

Best Answer

Answers

  • cjaln1994cjaln1994 MelbourneMember

    I also thought about trying this:

    1. Set unique sample IDs for all the 20 replicates (e.g. SampleA1, SampleA2... etc. SampleB1, SampleB2... etc.)
    2. Merge the files and run HC which recognises them as 20 separate samples (run HC with ploidy = 2x20 = 40)
    3. At the VCF stage find a way to merge these variants such that SampleA1, SampleA2... etc. all collapse into SampleA and SampleB1, SampleB2...etc. collapse into SampleB
Sign In or Register to comment.