If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Picard Optical Duplicates calculation 2.0 vs 2.1+

tboothtbooth Edinburgh, UKMember


We have an old pipeline that has been running with Picard 1.141 and reporting various metrics including optical dupes in our Illumina data. I'm just upgrading it to use Picard 2.8.1 and I see that the optical dupe numbers are now coming out way lower - like two orders of magnitude lower! Digging in the GitHub history I think this is the commit that changed things, back in Feb 2016 just prior to release 2.1, contributed by Tim Fennell:

And I've confirmed by running Picard 2.0 and 2.1 that the change is definitely between these releases.

I can't see from eyeballing the code if there was a fundamental change to the calculation or if maybe it's to do with the READ_NAME_REGEX setting - I've just been using the default as the reads follow the standard bcl2fastq naming convention. To save me having to do some sort of forensic analysis I wondered if anyone on the forum can explain the change or remembers any discussion that happened at the time? I've tested this on a single tile from a HiSeq 4000 run and also on unrelated data from a HiSeq 2500 lane so I don't think it's something special to my BAM files. Any ideas?

Many thanks in advance,


Issue · Github
by Sheila

Issue Number
Last Updated
Closed By


  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited January 2017

    Hi @tbooth,

    Our developer says you should actually see the opposite. That the changes should increase number of optical duplicates and not decrease them.

    original method said that an optical duplicate is a read that is close to the representative, and now any group of reads that are close will only have 1 non-optical duplicate.

    Does this help clarify?

    Post edited by shlee on
Sign In or Register to comment.