The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Powered by Vanilla. Made with Bootstrap.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.
Register now for the upcoming GATK Best Practices workshop, Feb 20-22 in Leuven, Belgium. Open to all comers! More info and signup at http://bit.ly/2i4mGxz

Picard Optical Duplicates calculation 2.0 vs 2.1+

tboothtbooth Edinburgh, UKMember Posts: 1

Hi,

We have an old pipeline that has been running with Picard 1.141 and reporting various metrics including optical dupes in our Illumina data. I'm just upgrading it to use Picard 2.8.1 and I see that the optical dupe numbers are now coming out way lower - like two orders of magnitude lower! Digging in the GitHub history I think this is the commit that changed things, back in Feb 2016 just prior to release 2.1, contributed by Tim Fennell:

https://github.com/broadinstitute/picard/commit/2d68dd15775d0f7297af87979e041f3afe06dce1

And I've confirmed by running Picard 2.0 and 2.1 that the change is definitely between these releases.

I can't see from eyeballing the code if there was a fundamental change to the calculation or if maybe it's to do with the READ_NAME_REGEX setting - I've just been using the default as the reads follow the standard bcl2fastq naming convention. To save me having to do some sort of forensic analysis I wondered if anyone on the forum can explain the change or remembers any discussion that happened at the time? I've tested this on a single tile from a HiSeq 4000 run and also on unrelated data from a HiSeq 2500 lane so I don't think it's something special to my BAM files. Any ideas?

Many thanks in advance,

TIM

Issue · Github
by Sheila

Issue Number
1635
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Answers

  • shleeshlee CambridgeMember, Administrator, Broadie, Moderator, Dev Posts: 422 admin
    edited January 17

    Hi @tbooth,

    Our developer says you should actually see the opposite. That the changes should increase number of optical duplicates and not decrease them.

    original method said that an optical duplicate is a read that is close to the representative, and now any group of reads that are close will only have 1 non-optical duplicate.

    Does this help clarify?

    Post edited by shlee on
Sign In or Register to comment.