This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Grouping duplicate reads together
I've been using the Picardtools MarkDuplicates tool. I'd like to identify which reads are duplicates of each other (ie. if read.1234 is a duplicate of read.5678, I want to be able to retrieve this relationship). Does the MarkDuplicates output indicate this in any way? While I could group reads together if they share the same start coordinate listed in the BAM file, this gets a little tricky if the reads align to the minus strand, or if there are mismatches in the first couple of nucleotides in the read. I think the MarkDuplicates program must be collecting this information behind the scenes when it's finding duplicates. Thank you very much for your help.