Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Is my RNAseq experimental design suitable for SNP calling?

Hello fellows,

I am a newbie here and would highly appreciate your advice about one particular experimental design.

We have data from RNAseq experiment which was originally designed to assess differential expression. The details of experiment are as follows:

2 modalities of the phenotype

Each phenotype is represented by 4 samples. 1 sample = 60 individuals pooled together at the stage of RNA isolation.

Molecule – polyadenylated mRNA

Sequencing chemistry – Illumina paired-end, read length - 2*100 bp

My question is whether it is correct to use this RNAseq data to call for SNPs? I made previous search and found that most of people calling SNP from RNAseq use 40-1000 samples (= individuals). But they initially designed RNAseq experiment for further GWAS. I see that this analysis cannot be applied to my data (at least because in my case individual flies were pooled without barcoding – 60 flies per a sample). However, can I still call for SNPs and upload the list to database as a list of potential targets for GWAS with, for example, estimation of functional impact upon protein structure? Will they be “true” SNPs, or our experimental design makes even this step invalid?

I found this paper https://www.ncbi.nlm.nih.gov/pubmed/27458203 where people used 2 phenotypes each represented by 2 samples what is almost like our experiment, but still have doubts.

Best Answer

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MA admin
    Accepted Answer

    Hi Nina, unfortunately it's difficult to get reliable calls from pooled samples, especially from such a large number pooled together. Typically that sort of experimental design is used for extracting very common variation only, for phylogenetic analysis for example. I really wouldn't recommend using this for something as statistically demanding as GWAS.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @windhavenn
    Hi,

    Can you tell us a little more about your end goal?

    You may be interested in ASEReadCounter which can help you with differential expression analysis.

    For calling SNPs, you can use HaplotypeCaller, but you may need to play around with the ploidy setting.

    -Sheila

  • Hi Sheila,

    Thank you for the reply!

    We have already made the differential expression analysis and were thinking if we can get more reliable information from our data without spending extra money. For example, find SNPs in CDS and link them to the differential expression. But the thing is that I did not find papers with strict rules about statistics related to SNP calling. From forums and personal communications I found out that association of SNPs with any trait - expression, splicing or ROS production, as we also planned to find - can be done only by genotyping large number of individual samples independently. But we had 60 individual samples pooled together in 1 vial, and this is an obstacle for GWAS.

    My second question was can we at least find SNPs and be sure that they are true SNPs, not sequence errors or biases due to the level of gene expression and PCR step during the library preparation. If we could find these "true" SNPs and upload those to database, later we or any other people could use them as a basis for further study. But in GATK forum I found that SNPs found in RNAseq data do not necessarily exist in DNA, and therefore cannot be treated as genotype per se without confirmation by targeted DNAseq. We are not going to DNAseq, so this opportunity also seems to be closed.

    I would be grateful if you can say if someone did SNP search on small N of samples, or on a few pooled samples as we have and get reliable results.

    All the best,

    Nina.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Hi Nina, unfortunately it's difficult to get reliable calls from pooled samples, especially from such a large number pooled together. Typically that sort of experimental design is used for extracting very common variation only, for phylogenetic analysis for example. I really wouldn't recommend using this for something as statistically demanding as GWAS.

  • @Geraldine_VdAuwera said:
    Hi Nina, unfortunately it's difficult to get reliable calls from pooled samples, especially from such a large number pooled together. Typically that sort of experimental design is used for extracting very common variation only, for phylogenetic analysis for example. I really wouldn't recommend using this for something as statistically demanding as GWAS.

    Thank you for the answer, Geraldine! I think this issue is closed now.

Sign In or Register to comment.