Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Picard Mark Duplicates handling of Library Information
I was hoping this had been addressed already on the forum, but I've not seen a definitive answer although I have seen a similar question posed on this and other forums.
Our current mark duplicate procedure using Picard MarkDuplicates is to run merges across lane data generated from the same library. I believe this makes sense, and once duplicates are marked, then library level merges are combined to create a sample level, multi-library bam file. Any duplicates found across libraries would not be expected to be PCR duplicates but instead just identical fragments.
It's not clear though whether Picard MarkDuplicates is library aware....ie. when it does mark duplicates does it account for read pairs only from the same library, or if run against a bam merge generated from multiple libraries, will it mark any duplicates it finds.
I don't see this addressed in the documentation, so I assume that is not the case, but I have seen suggestions elsewhere that it might be so.