We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Why does collectInsertSizeMetrics sometimes display a dual output?

SbamoSbamo MunichMember

Hello,

I noticed that the output of CollectInsertSizeMetrics displays 2 rows in its output for some samples.
The samples come from bone marrow cells of patiens with haematological malignancies.
I tested the samples using a local Galaxy instance. The samples were trimmed using Trimmomatic, aligned with TopHat2 and reordered with the picard tool "ReorderSam".

The Version of "CollectInsertSizeMetrics" is 1.136.0
The two rows differ in all collumns of the output (eg. MEDIAN_INSERT_SIZE: 233 vs. 3815, READ_PAIRS: 22602994 vs. 2992312, PAIR_ORIENTATION: FR vs. RF and so on)

Has anyone encountered similar issues?

Thank you in advance
Sbamo

Issue · Github
by Sheila

Issue Number
408
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Sbamo
    Hi Sbamo,

    I'm not sure what exactly you mean. Does your input bam file contain more than 1 sample? Can you post an example of your output?

    Thanks
    Sheila

  • SbamoSbamo MunichMember

    image

    Hi Sheila,

    I attached an image of my output. My BAM-file contains only 1 PE sample.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Sbamo
    Hi Sbamo,

    Can you please try with the latest version of Picard? This might solve the issue.

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Sbamo
    Hi again Sbamo,

    Another team member has suggested that the tool is picking up read pairs in the two different orientations (FR and RF). If it finds enough pairs in a given orientation, it will output another row of insert size metrics for the pairs in that category.

    Do you have data from sequencing a jumping library (which produces RF "outie" pairs, but also sometimes includes FR "innie" pairs when the pulldown is inefficient)?

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @Sbamo
    Hi again!

    Here is some extra information about the RF and FR pairs. Illumina sequencing libraries produce a forward read and a reverse read ---> <--- in each pair (hence the nickname "innie" because they point in). RF indicates a reverse read followed by a forward read <--- ---> ("outie" because they point out).

    FR: 5' --F--> <--R-- 5'
    RF: <--R-- 5' 5' --F-->

    There is also another type of pair orientation called tandem.
    TANDEM: 5' --F--> 5' --F--> or <--R-- 5' <--R-- 5'

    You can read more about these pair orientations here.

    I hope this helps!

    -Sheila

Sign In or Register to comment.