Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

filter for strand bias in stranded RNAseq?

Hello,

I was wondering if it makes sense to filter for strand bias as stated in the Best Practice RNAseq Variant Calling guide as most of todays RNAseq data is strand specific. I would actually expect high strand biases of variants and be suspicious about variants which do NOT show strand bias =)
...or did i get something wrong with the Fisher Strand values?

Thank you

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @michel
    Hi,

    You are right that the RNA molecules themselves are produced in a strand specific manner. However, you then generate cDNA for sequencing. The cDNA is sequenced from both strands; hence you expect no strand bias in the sequencing.

    -Sheila

  • AndresRiboneAndresRibone Member

    @Sheila said:
    @michel
    Hi,

    You are right that the RNA molecules themselves are produced in a strand specific manner. However, you then generate cDNA for sequencing. The cDNA is sequenced from both strands; hence you expect no strand bias in the sequencing.

    -Sheila

    What about single end data?
    We have these RNAseq samples (Illumina stranded 2x100) which ~70% of reads overlapped so we decided to fuse the mates creating single end reads. (We used FLASH). The new reads had better mapping metrics. ¿Should I disable the FisherStrand filter?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @AndresRibone
    Hi,

    ~70% of reads overlapped so we decided to fuse the mates creating single end reads

    Can you explain this a bit more? What do you mean by 70% of reads overlapped? Is that expected?

    Did you generate variants with HaplotypeCaller before fusing and after fusing? Can you post some example records that contain the FS annotation?

    Thanks,
    Sheila

  • AndresRiboneAndresRibone Member
    edited June 2018

    @Sheila
    Hi,
    Apparently, 70 % of the cDNA fragments (from which the paired reads were sequenced) were smaller than 200 bases. That's the only explanation on why the paired reads overlapped.

    Original fragment:
    5'-ATCGTGCATCTAGCTTAGCTAGCTCGTAGCTGTGCGATCGATCAGCTAGTAACCG-3'

    Reads (stranded):
    5'-ATCGTGCATCTAGCTTAGCTAGCTCGTAGCTGTGCGA-3'
    ...............3'-ATCGATCGAGCATCGACACGCTAGCTAGTCGATCATTGGC-5'

    "Synthetic" new stranded single end read:
    5'-ATCGTGCATCTAGCTTAGCTAGCTCGTAGCTGTGCGATCGATCAGCTAGTAACCG-3'

    By doing this merging, I got better mapping with STAR. So, all the GATK things I did so far were using the "mostly single end reads" alignment.

    Sorry, what is the FS annotation?

    Thanks in advance!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @AndresRibone
    Hi,

    I see. Okay, well if the mapping is better with the fused reads, it may be best to stick with those.

    You can read more about FisherStrand here.

    -Sheila

Sign In or Register to comment.