Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Grouping duplicate reads together
I've been using the Picardtools MarkDuplicates tool. I'd like to identify which reads are duplicates of each other (ie. if read.1234 is a duplicate of read.5678, I want to be able to retrieve this relationship). Does the MarkDuplicates output indicate this in any way? While I could group reads together if they share the same start coordinate listed in the BAM file, this gets a little tricky if the reads align to the minus strand, or if there are mismatches in the first couple of nucleotides in the read. I think the MarkDuplicates program must be collecting this information behind the scenes when it's finding duplicates. Thank you very much for your help.