Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Is it sufficient to mark the duplicates per-sample?

akrishnaakrishna Posts: 2Member
edited March 18 in Ask the GATK team

Referring to broadinstitute.org/gatk/guide/article?id=3060, is removing duplicates necessary to be done twice, once per-lane and then per-sample?

Is it not enough to just mark the duplicates in the final BAM file with all the lanes merged, which should remove both optical and PCR duplicates (I am using Picard MarkDuplicates.jar)? So specifically, in the link above what is wrong with generating -

  • sample1_lane1.realn.recal.bam
  • sample1_lane2.realn.recal.bam
  • sample2_lane1.realn.recal.bam
  • sample2_lane2.realn.recal.bam

Then, merging them to get

  • sample1.merged.bam
  • sample2.merged.bam

and finally, include "de-dupping" only for the merged BAM file.

  • sample1.merged.dedup.realn.bam
  • sample2.merged.dedup.realn.bam
Post edited by akrishna on

Best Answer

Answers

Sign In or Register to comment.