Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

How to use GATK for RNA-seq analysis?

danielyindanielyin Posts: 7Member

Hi all: I find that among all the work flows of GATK http://www.broadinstitute.org/gatk/guide/topic?name=methods-and-workflows there are no workflows for RNA-seq analysis. I understand that GATK mainly focuses on variant calling, can anyone tell me how to use GATK for RNA-seq analysis?

thanks daniel

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    Hi Daniel,

    We have indeed not yet formulated any best practices specific for calling variants from RNAseq data. The basic workflow should be the same as the generic Best Practices workflow, but there are probably some adaptations that need to be made at specific steps. We do not have the expertise to identify these points, but we know that some of our users have used the GATK successfully on RNAseq data. Hopefully some of them will have the time and inclination to share their experience with you here.

    Geraldine Van der Auwera, PhD

  • danielyindanielyin Posts: 7Member
    edited June 2013

    Hi Geraldine: Thanks very much for your prompt reply.

    Actually what I mean is not calling variant from RNAseq data. We are doing an investigation on all the tools for RNAseq analysis, which is actually the gene expression analysis instead of variant calling, such as the tophat or GenePattern that broad institute has developed.

    My question is how we can use GATK in the workflow of the gene expression analysis, which typically including the following steps:

    Align RNA-seq data to a reference genome-- Estimate known gene and transcript expression-- Perform differential expression analysis-- Detect expressed gene fusions-- Discover novel isoforms-- Visualize and summarize the output of RNA-seq analyses

    Thanks daniel

    @Geraldine_VdAuwera said: Hi Daniel,

    We have indeed not yet formulated any best practices specific for calling variants from RNAseq data. The basic workflow should be the same as the generic Best Practices workflow, but there are probably some adaptations that need to be made at specific steps. We do not have the expertise to identify these points, but we know that some of our users have used the GATK successfully on RNAseq data. Hopefully some of them will have the time and inclination to share their experience with you here.

    Post edited by danielyin on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    Oh, I see -- then I'm afraid I have to disappoint you; we don't have any tools for expression analysis in the GATK at this time.

    Geraldine Van der Auwera, PhD

  • danielyindanielyin Posts: 7Member

    Thanks for your reply!

    @Geraldine_VdAuwera said: Oh, I see -- then I'm afraid I have to disappoint you; we don't have any tools for expression analysis in the GATK at this time.

  • danielyindanielyin Posts: 7Member

    I confused the genome expression analysis with RNAseq analysis. Is it possible to use GATK for the RNAseq analysis, like the function of RNAseq analysis in genepattern?

    Thanks. Daniel

    @Geraldine_VdAuwera said: Oh, I see -- then I'm afraid I have to disappoint you; we don't have any tools for expression analysis in the GATK at this time.

  • drchriscoledrchriscole Posts: 16Member

    I think you're missing the point of GATK. It's for analysing genomic data (i.e. sequencing data generated from DNA) and sepcifically to call variants between genomes or exomes.

    Performing gene expression (i.e. RNA transcripts) data analysis is completely beyond the scope of its main purpose.

  • danielyindanielyin Posts: 7Member

    Thanks very much for your reply. Daniel

    @drchriscole said: I think you're missing the point of GATK. It's for analysing genomic data (i.e. sequencing data generated from DNA) and sepcifically to call variants between genomes or exomes.

    Performing gene expression (i.e. RNA transcripts) data analysis is completely beyond the scope of its main purpose.

  • JahnDavikJahnDavik BioforskPosts: 2Member

    Do I understand correctly that SNP identification in de novo assemblies from RNAseq data would be feasible? I have a de novo transcriptome assembly generated by Trinity and would like to identify SNPs in the genotypes that this assembly is based on. Would GATK be suitable for this job? Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    Hi @JahnDavik,

    The current GATK is not designed to handle RNA seq data, so while it can in theory be done to some degree, there are various pitfalls involved and we don't provide support for it. However we have been working on some new tools and methods to build in support for RNAseq data analysis, which we hope to make available to the public in the next release of GATK.

    Geraldine Van der Auwera, PhD

  • JahnDavikJahnDavik BioforskPosts: 2Member

    OK. Thanks for the reply. When is the next release due`?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    We don't have a set schedule, but I think it'll be at least three to four weeks.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.