Grouping duplicate reads together
I've been using the Picardtools MarkDuplicates tool. I'd like to identify which reads are duplicates of each other (ie. if read.1234 is a duplicate of read.5678, I want to be able to retrieve this relationship). Does the MarkDuplicates output indicate this in any way? While I could group reads together if they share the same start coordinate listed in the BAM file, this gets a little tricky if the reads align to the minus strand, or if there are mismatches in the first couple of nucleotides in the read. I think the MarkDuplicates program must be collecting this information behind the scenes when it's finding duplicates. Thank you very much for your help.