It looks like you're new here. If you want to get involved, click one of these buttons!
I am quite knew to the analysis of RNA-seq data and was searching the www for an answer relating to the interpreation of the results outputted by Estimate_libraryComplexity. The only discussion I found was this one here (https://www.biostars.org/p/103503/). However, it doesn't sufficiently helped me with my problem.
I am interested in the distribution of the duplicates in my RNA-seq dataset, which, when I understood it correctly, is shown in the histogram of the EstimateLibraryComplexity.
This is my output:
LIBRARY UNPAIRED_READS_EXAMINED READ_PAIRS_EXAMINED SECONDARY_OR_SUPPLEMENTARY_RDS UNMAPPED_READS UNPAIRED_READ_DUPLICATES READ_PAIR_DUPLICATES READ_PAIR_OPTICAL_DUPLICATES PERCENT_DUPLICATION ESTIMATED_LIBRARY_SIZE
Unknown 0 9285609 0 0 0 1376414 156167 0.148231 31035324
What does this histogram mean? Does the "1" in the first row and first column mean that there are 6783948 reads that have 1 duplicate, or are these 6783948 unique reads? Otherwise how to I calculate the number of reads that do not have any duplicates.
Is it possible also to calculate the number of total reads examined or the total number of duplicates within this sample? If so how?
Is it also somehow possible to obtain the number of duplicates at each position, to which a read map, in the genome? Do I get this information as well from the histogram?
I hope, it is clear what I am asking about. I would be very happy if some of you could bring some light into the darkness.