The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

CollectRnaSeqMetrics - RIBOSOMAL_BASES

Hi,
I am having a problem with RIBOSOMAL_BASES estimated as 0.
When I open the mapped reads I observe several reads at ribosomal positions.
The same problem is discussed in another forum but no solution is described: https://sourceforge.net/p/samtools/mailman/message/29445029/

I already search for a similar problem in FAQ and GATK forum and also run ValidateSamFile.
The chromosome names are the same in the refFlat file, BAM and interval_list.

Can anyone help me?

Command line:
java -jar picard.jar CollectRnaSeqMetrics I=input--bowtie2-RG.sorted.fixed_mate.bam O=output--bowtie2-RG.sorted.fixed_mate-Second_Strand-Metrics REF_FLAT=LbrM2903--flatfile.txt STRAND=SECOND_READ_TRANSCRIPTION_STRAND RIBOSOMAL_INTERVALS=rRNAs_LbrM2903.interval_list

refFlat file excerpt:
LBRM2903_000009300 LBRM2903_000009300 LBRM2903_SCAF000663 - 102 434 102 434 1 102 434
LBRM2903_000009400 LBRM2903_000009400 LBRM2903_SCAF000674 - 285 725 285 725 1 285 725
LBRM2903_000009500 LBRM2903_000009500 LBRM2903_SCAF000686 - 544 1026 544 1026 1 544 1026
LBRM2903_000009600 LBRM2903_000009600 LBRM2903_SCAF000689 - 158 1174 158 1174 1 158 1174
LBRM2903_000009700 LBRM2903_000009700 LBRM2903_SCAF000701 - 11 208 11 208 1 11 208
LBRM2903_000009800 LBRM2903_000009800 LBRM2903_SCAF000702 - 147 1901 147 1901 1 147 1901
LbrM2903_010005000 LbrM2903_010005000 LbrM2903_01 - 3897 4895 3897 4895 1 3897 4895
LbrM2903_010005100 LbrM2903_010005100 LbrM2903_01 - 5941 7602 5941 7602 1 5941 7602
LbrM2903_010005200 LbrM2903_010005200 LbrM2903_01 - 9153 11144 9153 11144 1 9153 11144
LbrM2903_010005300 LbrM2903_010005300 LbrM2903_01 - 12123 12683 12123 12683 1 12123 12683
LbrM2903_010005400 LbrM2903_010005400 LbrM2903_01 - 14878 16875 14878 16875 1 14878 16875
LbrM2903_010005500 LbrM2903_010005500 LbrM2903_01 - 18129 18845 18129 18845 1 18129 18845
LbrM2903_010005600 LbrM2903_010005600 LbrM2903_01 - 19985 21838 19985 21838 1 19985 21838
LbrM2903_010005700 LbrM2903_010005700 LbrM2903_01 - 22674 23366 22674 23366 1 22674 23366

intervals_list file excerpt:
@SQ SN:LbrM2903_32 LN:1666992
@SQ SN:LbrM2903_33 LN:1581728
@SQ SN:LbrM2903_34 LN:2192442
@SQ SN:LbrM2903_35 LN:2890053
LbrM2903_06 328796 331000 + LbrM2903_06_rRNA.01
LbrM2903_06 331200 331390 + LbrM2903_06_rRNA.07
LbrM2903_06 331921 333701 + LbrM2903_06_rRNA.13
LbrM2903_06 333757 333968 + LbrM2903_06_rRNA.19
LbrM2903_06 334128 335652 + LbrM2903_06_rRNA.25
LbrM2903_06 335704 335864 + LbrM2903_06_rRNA.31
LbrM2903_06 336365 336437 + LbrM2903_06_rRNA.37
LbrM2903_06 336668 336796 + LbrM2903_06_rRNA.43
LbrM2903_09 439672 439790 + LbrM2903_09_5SrRNA.01
LbrM2903_09 456487 456602 - LbrM2903_09_5SrRNA.02
LbrM2903_11 1421 1543 - LbrM2903_11_5SRRNA.01
LbrM2903_11 9103 9225 - LbrM2903_11_5SRRNA.02

BAM - samtools view -H:
@SQ SN:LBRM2903_SCAF000705 LN:1284
@SQ SN:LBRM2903_SCAF000706 LN:398
@SQ SN:LBRM2903_SCAF000707 LN:353
@SQ SN:LBRM2903_SCAF000708 LN:560
@SQ SN:LBRM2903_SCAF000709 LN:905
@SQ SN:LbrM2903_01 LN:286661
@SQ SN:LbrM2903_02 LN:352826
@SQ SN:LbrM2903_03 LN:408717
@SQ SN:LbrM2903_04 LN:502955
@SQ SN:LbrM2903_05 LN:505114
@SQ SN:LbrM2903_06 LN:590413
@SQ SN:LbrM2903_07 LN:639434
@SQ SN:LbrM2903_08 LN:537514
@SQ SN:LbrM2903_09 LN:620965
@SQ SN:LbrM2903_10 LN:695527

Program version:
picard-2.6.0
java version "1.8.0_101"

Tagged:

Issue · Github
by Sheila

Issue Number
1366
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • shleeshlee CambridgePosts: 600 admin
    edited October 2016

    Hi @patriciaruy,

    I'd like to confirm that your libraries are indeed stranded, as you indicate they are with the following parameter:

    STRAND=SECOND_READ_TRANSCRIPTION_STRAND

    Can you check the stdout for something akin to the following line (from this post):

    INFO    2016-01-08 08:54:50     CollectRnaSeqMetrics    Loaded 0 genes.
    

    Do you also have 0 genes or are there genes loaded?

    Also, can you post some of the BAM records that align to an rDNA locus?

    When I open the mapped reads I observe several reads at ribosomal positions.

  • patriciaruypatriciaruy BrazilPosts: 3
    edited October 2016

    Hi @shlee, thanks for the answer.

    My libraries are stranded and the genes are loaded correctly:

    INFO    2016-10-24 14:20:14 CollectRnaSeqMetrics    Loaded 9378 genes.
    

    Some BAM records of this region:

    NS500600:22:HHFG2BGXY:4:11412:12823:16489   163 LbrM2903_06 328806  44  150M    =   328836  181 ATTCTGCCAGTAGTCATATGCTTGTTTCAAGGACTTAGCCATGCATGCCTCAGAATCACTGCATTTGCAGGAAT
    CTGCGCATGGCTCATTACATCAGACGTAATCTGCCGCAAAAATCTTGCGGTTTCCGCAAAATTGGATAACTTGGCG    AAA/AEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEE<EEEEEEAEEEEEEEEEEEAEAEEEEEEEEEEEE<EEA<EAEEEEEEEEEAEEEE
    EEAEAEA<AEAEA/<6AAAEEEA<<<EAEE<<//</    MD:Z:150    RG:Z:AXA1_L004  XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:300    YS:i:302    YT:Z:CP
    NS500600:22:HHFG2BGXY:4:11509:3527:9896 163 LbrM2903_06 328806  44  1S150M  =   328852  197 TATTCTGCCAGTAGTCACATGCTTGTTTCAAGGACTTAGCCATGCATGCCTCAGAATCACTGCATTTGAAGGAATATGCGCA
    TGGCTCATTACATCAGACGTAATCTGCAGCAAAAATCTAGCGCTTTCAGCATAATTGGATAACTTGGCG   AA/AAEEEEE/EEEEEEEEEE</EEEE/EEE/EEEE/EEEEEAAAEEEE/EAEAE/EAAEEEEEE//A/EE6EEE/A/AEEEEAEEEEEEA<EEAEEE<///AE//A<///<A/AAAE///<
    A//E/6<//<A/<E/</6//AAA/<//</   MD:Z:16T50C6C33C10T3G4C3A17 RG:Z:AXA1_L004  XG:i:0  NM:i:8  XM:i:8  XN:i:0  XO:i:0  AS:i:258    YS:i:285    YT:Z:CP
    NS500600:22:HHFG2BGXY:4:11601:13134:6390    163 LbrM2903_06 328806  44  150M    =   328835  180 ATTCTGCCAGTAGTCATATGCTTGTTTCAAGGACTTAGCCATGCATGCCTCAGAATCACTGCATTTGCAGGAAT
    CTGCGCATGGCTCATTACATCAGACGTAATCTGCCGCAAAAATCTTGCGGTTTCCGCAAAATTGGATAACTTGGCG    AAAAAEEE/EEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAA
    EEEA<AAEEAEEEEE<EAEEAE//<</EEAAA<AA<    MD:Z:150    RG:Z:AXA1_L004  XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:300    YS:i:302    YT:Z:CP
    NS500600:22:HHFG2BGXY:4:11609:4632:15293    163 LbrM2903_06 328806  44  148M    =   328806  -148    ATTCTGCCAGTAGTCATATGCTTGTTTCAAGGACTTAGCCATGCATGCCTCAGAATCACTGCATTTGCAGGAAT
    CTGCGCATGGCTCATTACATCAGACGTAATCTGCCGCAAAAATCTTGCGGTTTCCGCAAAATTGGATAACTTGG  AAAAAEEAEEAEEEEEEEEEAEEEEEEEEEEE/EEEEEEEEEEEEE<</EEEEEEEEEEEEAEEEEEEEEEEAEEEEEEA<EEEEEEEEEAEAEE/EE<EEE/EEEEE66EEEE
    EAE<AEEE<AEEE/6AE/EEEA<A/EEEEAE<<<  MD:Z:148    RG:Z:AXA1_L004  XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:296    YS:i:296    YT:Z:CP
    NS500600:22:HHFG2BGXY:4:11609:8982:19472    163 LbrM2903_06 328806  44  150M    =   328828  172 ATTCTGCCAGTAGTCATATGCTTGTTTCAAGGACTTAGCCATGCATGCCTCAGAATCACTGCATTTGCAGGAAT
    CTGCGCATGGCTCATTACATCAGACGTAATCTGCCGCAAAAATCTTGCGGTTTCCGCAAAATTGGATAACTTGGCG    AAAAAE/EEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEEEEEEEEEEEEAEEEEEEEEEEEEEAEEEEEEAAEEEAEA<EEE/EEE
    EEEEEEEEEEEEEEEAEEEEE<AA<EEEEAAA<AEA    MD:Z:150    RG:Z:AXA1_L004  XG:i:0  NM:i:0  XM:i:0  XN:i:0  XO:i:0  AS:i:300    YS:i:300    YT:Z:CP
    NS500600:22:HHFG2BGXY:4:12403:11230:9682    163 LbrM2903_06 328806  44  111M    =   328806  -111    ATTCTGCCAGTAGTCATATGCTTGTTTCAAGGACTTAGCCATGCATGCCTCAGAATCACTGCATTTGCAGGAAT
    CTGCGCATGGCTCATTACATCAGACGTAATCTGCCGC   AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAEEEEEEEEEEEEEEEE MD:Z:111    RG:Z:AXA1_L004  XG:i:0  NM
    :i:0    XM:i:0  XN:i:0  XO:i:0  AS:i:222    YS:i:222    YT:Z:CP
    NS500600:22:HHFG2BGXY:4:12403:19528:2129    163 LbrM2903_06 328806  44  111M    =   328806  -111    ATTCTGCCAGTAGTCATATGCTTGTTTCAAGGAATTAGCCATGCATGCCTCAGAATCACTGCATTTGCAGGAAT
    CTGCGCATGGCTCATTACATCAGAAGTAATCTGACGC   AAA/A////AAA/A<EE///E/AE/E/EEA6/E/EAEEEEEEE/AEEEEE////EE/6EE//AAEEEE//EEAEEEEAEEAE///E/6EEE/EA6<A</E//AEEE</<</ MD:Z:33C64C8C3  RG:Z:AXA1_L004  XG:i:0  NM
    :i:3    XM:i:3  XN:i:0  XO:i:0  AS:i:207    YS:i:222    YT:Z:CP
    

    The image of the rRNA locus with the mapped reads is attached. Would you like the BAM file with all the reads of this locus?

    rRNAs.png
    1140 x 582 - 717K
    Post edited by shlee on
  • shleeshlee CambridgePosts: 600 admin

    Just wanted to let you know @patriciaruy that I'm back from being out of town and will look into this soon.

  • shleeshlee CambridgePosts: 600 admin

    @patriciaruy, Let's try to debug this together. For reference, here is the link to the tool documentation. Here are some things to try or answer:

    1. Try the other STRAND options: FIRST_READ_TRANSCRIPTION_STRAND and NONE. Do either one of these options change the results?
    2. Can you show us an excerpt of the refFlat file that is from the same interval as the RIBOSOMAL_INTERVALS file? Let's see if these intervals match each other in coordinates and also confirm that the ribosomal intervals are also present in the refFlat intervals.
    3. Do you observe any other peculiarities with the output RNA metrics?
    4. I'm not familiar with the genomics browser you are using. We typically use IGV. Can you explain what the blue versus green represent?

    If these puzzle pieces yield some clues as to why you are observing RIBOSOMAL_BASES=0, then we can follow up on them. Otherwise, we will ask you for enough of your data to recapitulate the observation. We'll need the input BAM, REF_FLAT and RIBOSOMAL_INTERVALS files or portions thereof. Also, to visualize the data, and to possibly manipulate it, we'll need the reference files you use (FASTA, FAI, DICT). Information on how to send files to us is here.

  • patriciaruypatriciaruy BrazilPosts: 3

    Thanks @shlee,

    1. I already test the other strand options and the RIBOSOMAL_BASES still zero.

    2. Excerpt of the refFlat file with some RIBOSOMAL_INTERVALS:
      LbrM2903_060014100 LbrM2903_060014100 LbrM2903_06 + 328796 330999 328796 330999 1 328796 330999
      LbrM2903_060014200 LbrM2903_060014200 LbrM2903_06 + 331220 331387 331220 331387 1 331220 331387
      LbrM2903_060014100 LbrM2903_060014100 LbrM2903_06 + 331921 333701 331921 333701 1 331921 333701
      LbrM2903_060014100 LbrM2903_060014100 LbrM2903_06 + 333757 333968 333757 333968 1 333757 333968
      LbrM2903_060014100 LbrM2903_060014100 LbrM2903_06 + 334128 335652 334128 335652 1 334128 335652
      LbrM2903_060014100 LbrM2903_060014100 LbrM2903_06 + 335704 335864 335704 335864 1 335704 335864
      LbrM2903_060014100 LbrM2903_060014100 LbrM2903_06 + 335682 335864 335682 335864 1 335682 335864
      LbrM2903_060014100 LbrM2903_060014100 LbrM2903_06 + 336365 336437 336365 336437 1 336365 336437
      LbrM2903_060014100 LbrM2903_060014100 LbrM2903_06 + 336668 336796 336668 336796 1 336668 336796

    3. I didn't observe any other peculiarity in the output.

    4. The genomic browser is for pathogens and calls Artemis (Sanger Institute). Blue and green lines are the mapped reads from one library.

    The upload of the files was made (RIBOSOMAL_BASES-problem.tar.gz).

    Issue · Github
    by Sheila

    Issue Number
    1401
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    sooheelee
  • shleeshlee CambridgePosts: 600 admin

    @patriciaruy I received your tarball and will get to it soon.

  • shleeshlee CambridgePosts: 600 admin

    Hi @patriciaruy,

    I can recapitulate your zero counts with your files. I'm noticing two things. (1) Your refFlat coordinates are noncanonical. Compare yours:

    LBRM2903_000005000  LBRM2903_000005000  LBRM2903_SCAF000006 -   1564    1776    1564    1776    1   1564    1776
    LBRM2903_000005100  LBRM2903_000005100  LBRM2903_SCAF000009 -   37  342 37  342 1   37  342
    

    to an excerpt of another user's refFlat:

    DDX11L1 NR_046018   chr1    +   11873   14409   14409   14409   3   11873,12612,13220,  12227,12721,14409,
    WASH7P  NR_024540   chr1    -   14361   29370   29370   29370   11  14361,14969,15795,16606,16857,17232,17605,17914,18267,24737,29320,  14829,15038,15947,16765,17055,17368,17742,18061,18366,24891,29370,
    

    Your refFlat uses identical tx, cds and exon coordinates, whereas the 2nd refFlat differentiates them. Perhaps you can find out if this is the cause for the zero ribosome locus counts using a similarly formatted refFlat file.

    (2) Your ribosome_interval list uses coordinates that appear 1-coordinate off at the ends. I don't think this should make a difference towards what counts as rDNA, and I actually have to check to see how the different files are interpreted by the tool, but it looks like a potential 0-based-index/1-based-index conversion snafu.

    wmd2a-330:RIBOSOMAL_BASES-problem shlee$ cat LbrM2903--flatfile.txt | grep '328796'
    LbrM2903_060014100  LbrM2903_060014100  LbrM2903_06 +   328796  330999  328796  330999  1   328796  330999
    wmd2a-330:RIBOSOMAL_BASES-problem shlee$ cat rRNAs_LbrM2903.interval_list | grep '328796'
    LbrM2903_06 328796  331000  +   LbrM2903_06_rRNA.01
    

    I'm gearing up for a workshop next week and so unfortunately I don't have a solution for you at the moment. What I can tell you is to try another tool for now called RNA-SeQC at http://archive.broadinstitute.org/cancer/cga/rna-seqc. It also counts rDNA bases. If you decide to try out this tool, I believe it requires Java v7 (as discussed here).

Sign In or Register to comment.