Simple explanation of MarkDuplicate

I am having a hard time understanding how MarkDuplicate works. Based on MarkDuplicate documentation, this is how it has been described: “The MarkDuplicates tool works by comparing sequences in the 5 prime positions of both reads and read-pairs in a SAM/BAM file.” I don’t understand what “5 prime positions” means in the above statement. Also, what does it mean in the context of “of both reads and read-pairs” ? If you could please explain that to me using an example I would really appreciate that.
Tagged:
Answers
Hi @ZLak,
Check out Article#6747 and our workshop presentations. There should be some slides going into the details of your exact question.
Hi @shlee,
Thanks very much for your response. I have already read through the article multiple times and nowhere in the article does it refer to what "5 prime positions” really means in the above statement. Any ideas?
@ZLak
Hi,
I think this dictionary entry on read pairs will help. For an explanation of 5 prime, I would google it.
-Sheila
Thanks for your answer.