The current GATK version is 3.8-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Download the latest Picard release at
GATK version 4.beta.3 (i.e. the third beta release) is out. See the GATK4 beta page for download and details.

Picard Optical Duplicates calculation 2.0 vs 2.1+

tboothtbooth Edinburgh, UKMember


We have an old pipeline that has been running with Picard 1.141 and reporting various metrics including optical dupes in our Illumina data. I'm just upgrading it to use Picard 2.8.1 and I see that the optical dupe numbers are now coming out way lower - like two orders of magnitude lower! Digging in the GitHub history I think this is the commit that changed things, back in Feb 2016 just prior to release 2.1, contributed by Tim Fennell:

And I've confirmed by running Picard 2.0 and 2.1 that the change is definitely between these releases.

I can't see from eyeballing the code if there was a fundamental change to the calculation or if maybe it's to do with the READ_NAME_REGEX setting - I've just been using the default as the reads follow the standard bcl2fastq naming convention. To save me having to do some sort of forensic analysis I wondered if anyone on the forum can explain the change or remembers any discussion that happened at the time? I've tested this on a single tile from a HiSeq 4000 run and also on unrelated data from a HiSeq 2500 lane so I don't think it's something special to my BAM files. Any ideas?

Many thanks in advance,


Issue · Github
by Sheila

Issue Number
Last Updated
Closed By


  • shleeshlee CambridgeMember, Broadie, Moderator
    edited January 17

    Hi @tbooth,

    Our developer says you should actually see the opposite. That the changes should increase number of optical duplicates and not decrease them.

    original method said that an optical duplicate is a read that is close to the representative, and now any group of reads that are close will only have 1 non-optical duplicate.

    Does this help clarify?

    Post edited by shlee on
Sign In or Register to comment.