RNAseq short variant discovery (SNPs + Indels)

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited January 9 in Best Practices Workflows

Purpose

Identify short variants (SNPs and Indels) in RNAseq data.


Diagram is not available


Reference Implementations

Pipeline Summary Notes Github FireCloud
RNAseq short variant per-sample calling BAM to VCF universal (expected) :) TBD

Expected input

This workflow is designed to operate on a set of samples one sample at a time; joint calling RNAseq is not supported.


_This workflow is in development; detailed documentation will be made available when the workflow is considered fully released.

Post edited by Geraldine_VdAuwera on
Tagged:

Comments

  • Hi GATK development team,

    I am currently using GATK for variant calling on RNAseq data. I think my analysis would really benefit from joint variant calling. So I have two questions:

    • Do you know when joint calling will be officialy implemented for RNAseq?
    • Meantime, is there a way to perform joint calling anyway (incremental analysis is not neccesary since I have less than 20 runs and enough computational resources)?

    Thank you for your help!

  • sbourgeoissbourgeois London, UKMember

    Hi @Geraldine_VdAuwera ,

    first of all, congrats to the whole team, this is great work, and the ability to all use a comparable pipeline should prove very valuable.

    Would you have any estimate as to when the RNAseq pipeline would be available on Firecloud?

    Also, the summary indicates BAM to VCF, and not uBAM, should we understand that mapping isn't part of the pipeline? In this case, as the aligner seems to be one of the largest source of discrepancies between pipelines, wouldn't that somewhat defeat the purpose? (I'm particularly thinking of comparison with exomes in GnomAD)

    Cheers,

    Steph

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks @sbourgeois :)

    We should be able to get the pipeline into FC fairly quickly if that would be helpful to you.

    Due to the idiosyncrasies of the project this particular WDL was developed for internally, it has a built-in RevertSam step that allows it to take any aligned bam and reprocess it from scratch, including alignment with STAR. Ultimately I would like to take that out to standardize on uBam as start point to the pipeline. We already have a separate WDL for the reversion process for those who need it.

    But that would add on some time for the script modifications; we could already make the existing version available in FC if you think you could work with that.

  • sbourgeoissbourgeois London, UKMember

    Hi @Geraldine_VdAuwera ,

    there is no emergency, I think it's better to wait until the WDL is modified in order to start from uBAMs; it would be quite wasteful to align the reads just to then get the resulting files through RevertSam.

    Thanks a lot :smile:

  • clareauclareau Member, Broadie

    Hi GATK development team,

    I was planning on doing genotyping of RNA-seq data for several samples this week. I noticed that the GATK4 RNA-seq workflow is still "in development". Would you all recommend still genotyping RNA-seq data with GATK3 to ensure accuracy at this time? Or is GATK4 fine to use (I can write my own WDL methods if needed)?

    Thanks!
    -Caleb

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @clareau
    Hi Caleb,

    I am assuming you are asking about the workflow here. In that workflow, HaplotypeCaller and SplitNCigarReads still use GATK3. You should be fine substituting GATK4 in those. The team has ported those tools in GATK4 but have not validated them in the workflow yet. It would be great if you could try it out and let us know how it works.

    -Sheila

Sign In or Register to comment.