Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Is it sufficient to mark the duplicates per-sample?
Referring to broadinstitute.org/gatk/guide/article?id=3060, is removing duplicates necessary to be done twice, once per-lane and then per-sample?
Is it not enough to just mark the duplicates in the final BAM file with all the lanes merged, which should remove both optical and PCR duplicates (I am using Picard MarkDuplicates.jar)? So specifically, in the link above what is wrong with generating -
Then, merging them to get
and finally, include "de-dupping" only for the merged BAM file.