The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.

Is it sufficient to mark the duplicates per-sample?

akrishnaakrishna Member Posts: 3
edited March 2014 in Ask the GATK team

Referring to broadinstitute.org/gatk/guide/article?id=3060, is removing duplicates necessary to be done twice, once per-lane and then per-sample?

Is it not enough to just mark the duplicates in the final BAM file with all the lanes merged, which should remove both optical and PCR duplicates (I am using Picard MarkDuplicates.jar)? So specifically, in the link above what is wrong with generating -

  • sample1_lane1.realn.recal.bam

  • sample1_lane2.realn.recal.bam

  • sample2_lane1.realn.recal.bam
  • sample2_lane2.realn.recal.bam

Then, merging them to get

  • sample1.merged.bam

  • sample2.merged.bam

and finally, include "de-dupping" only for the merged BAM file.

  • sample1.merged.dedup.realn.bam

  • sample2.merged.dedup.realn.bam

Best Answer

Answers

  • akrishnaakrishna Member Posts: 3

    Is this perhaps because the quality score recalibration, which is a per-lane step, will be affected by duplicates if we do not perform duplication marking during per-lane pre-processing?

Sign In or Register to comment.