Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Picard Optical Duplicates calculation 2.0 vs 2.1+
We have an old pipeline that has been running with Picard 1.141 and reporting various metrics including optical dupes in our Illumina data. I'm just upgrading it to use Picard 2.8.1 and I see that the optical dupe numbers are now coming out way lower - like two orders of magnitude lower! Digging in the GitHub history I think this is the commit that changed things, back in Feb 2016 just prior to release 2.1, contributed by Tim Fennell:
And I've confirmed by running Picard 2.0 and 2.1 that the change is definitely between these releases.
I can't see from eyeballing the code if there was a fundamental change to the calculation or if maybe it's to do with the READ_NAME_REGEX setting - I've just been using the default as the reads follow the standard bcl2fastq naming convention. To save me having to do some sort of forensic analysis I wondered if anyone on the forum can explain the change or remembers any discussion that happened at the time? I've tested this on a single tile from a HiSeq 4000 run and also on unrelated data from a HiSeq 2500 lane so I don't think it's something special to my BAM files. Any ideas?
Many thanks in advance,