Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

the order of merge and mark duplicate

SunhyeSunhye KoreaMember

I have a whole genome sequencing sample.
That consist of 1fastq file per lane.
That consist of multiple file per sample that produced per lane.

After I merge multiple bams, I progress MarkDuplicates using Picard.
But MarkDuplicates is very slow.

So I want to progress MarkDupllicate using bam per lane, then merge bam files.

I wonder whether the order of merge and MarkDuplicate affect post-analysis?

Best Answer


  • SunhyeSunhye KoreaMember

    Thanks Sheila!

  • falkerfalker GermanyMember

    I am already done with my GATK best practice analysis and just realized, that I ran Mark Duplicates on each read from each lane belonging to a sample.

    I'm afraid this is not the right way. I have to at least run Mark Duplicates per lane or can I trust my variant calling having it done that way?

  • falkerfalker GermanyMember
    edited February 2018

    I can answer that question myself for anybody who is interested on the impact of merging while running MarkDuplicates:

    Dataset ~20x coverage, 84 single u.bam files

    MarkDuplicates joint run (no. of duplicates) : 8610548
    MarkDuplicates single run (merging of files after MarkDuplicates: 547105

    So in that case, 17x more Duplicates found when running all files at once.

  • SheilaSheila Broad InstituteMember, Broadie admin
    edited March 2018


    Thanks for sharing. Perhaps this thread and this article will help as well.


    EDIT: This one too :smile:

Sign In or Register to comment.