The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Picard Optical Duplicates calculation 2.0 vs 2.1+

tboothtbooth Edinburgh, UKMember Posts: 1

Hi,

We have an old pipeline that has been running with Picard 1.141 and reporting various metrics including optical dupes in our Illumina data. I'm just upgrading it to use Picard 2.8.1 and I see that the optical dupe numbers are now coming out way lower - like two orders of magnitude lower! Digging in the GitHub history I think this is the commit that changed things, back in Feb 2016 just prior to release 2.1, contributed by Tim Fennell:

https://github.com/broadinstitute/picard/commit/2d68dd15775d0f7297af87979e041f3afe06dce1

And I've confirmed by running Picard 2.0 and 2.1 that the change is definitely between these releases.

I can't see from eyeballing the code if there was a fundamental change to the calculation or if maybe it's to do with the READ_NAME_REGEX setting - I've just been using the default as the reads follow the standard bcl2fastq naming convention. To save me having to do some sort of forensic analysis I wondered if anyone on the forum can explain the change or remembers any discussion that happened at the time? I've tested this on a single tile from a HiSeq 4000 run and also on unrelated data from a HiSeq 2500 lane so I don't think it's something special to my BAM files. Any ideas?

Many thanks in advance,

TIM

Issue · Github
by Sheila

Issue Number
1635
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
sooheelee

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator Posts: 499 admin
    edited January 17

    Hi @tbooth,

    Our developer says you should actually see the opposite. That the changes should increase number of optical duplicates and not decrease them.

    original method said that an optical duplicate is a read that is close to the representative, and now any group of reads that are close will only have 1 non-optical duplicate.

    Does this help clarify?

    Post edited by shlee on
Sign In or Register to comment.