We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

filter for strand bias in stranded RNAseq?

Hello,

I was wondering if it makes sense to filter for strand bias as stated in the Best Practice RNAseq Variant Calling guide as most of todays RNAseq data is strand specific. I would actually expect high strand biases of variants and be suspicious about variants which do NOT show strand bias =)
...or did i get something wrong with the Fisher Strand values?

Thank you

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @michel
    Hi,

    You are right that the RNA molecules themselves are produced in a strand specific manner. However, you then generate cDNA for sequencing. The cDNA is sequenced from both strands; hence you expect no strand bias in the sequencing.

    -Sheila

  • AndresRiboneAndresRibone Member

    @Sheila said:
    @michel
    Hi,

    You are right that the RNA molecules themselves are produced in a strand specific manner. However, you then generate cDNA for sequencing. The cDNA is sequenced from both strands; hence you expect no strand bias in the sequencing.

    -Sheila

    What about single end data?
    We have these RNAseq samples (Illumina stranded 2x100) which ~70% of reads overlapped so we decided to fuse the mates creating single end reads. (We used FLASH). The new reads had better mapping metrics. ¿Should I disable the FisherStrand filter?

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @AndresRibone
    Hi,

    ~70% of reads overlapped so we decided to fuse the mates creating single end reads

    Can you explain this a bit more? What do you mean by 70% of reads overlapped? Is that expected?

    Did you generate variants with HaplotypeCaller before fusing and after fusing? Can you post some example records that contain the FS annotation?

    Thanks,
    Sheila

  • AndresRiboneAndresRibone Member
    edited June 2018

    @Sheila
    Hi,
    Apparently, 70 % of the cDNA fragments (from which the paired reads were sequenced) were smaller than 200 bases. That's the only explanation on why the paired reads overlapped.

    Original fragment:
    5'-ATCGTGCATCTAGCTTAGCTAGCTCGTAGCTGTGCGATCGATCAGCTAGTAACCG-3'

    Reads (stranded):
    5'-ATCGTGCATCTAGCTTAGCTAGCTCGTAGCTGTGCGA-3'
    ...............3'-ATCGATCGAGCATCGACACGCTAGCTAGTCGATCATTGGC-5'

    "Synthetic" new stranded single end read:
    5'-ATCGTGCATCTAGCTTAGCTAGCTCGTAGCTGTGCGATCGATCAGCTAGTAACCG-3'

    By doing this merging, I got better mapping with STAR. So, all the GATK things I did so far were using the "mostly single end reads" alignment.

    Sorry, what is the FS annotation?

    Thanks in advance!

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @AndresRibone
    Hi,

    I see. Okay, well if the mapping is better with the fused reads, it may be best to stick with those.

    You can read more about FisherStrand here.

    -Sheila

Sign In or Register to comment.