EAMSS filter

KurtKurt Member ✭✭✭

Hello,

In Picard's IlluminaBasecallsToSam and IlluminaBasecallsToFastq, the default setting of APPLY_EAMSS_FILTER is set to true. We've noticed that newer versions of Illumina's bcl2fastq (version 2) does not perform EAMSS or even allows the user to select EAMSS filtering. Just wondering if in your pipelines you always perform EAMSS filtering regardless of Illumina's sequencing platform model (HiSeq, MiSeq, etc).

Thanks in advance,

Kurt

Issue · Github
by Sheila

Issue Number
765
State
open
Last Updated
Assignee
Array
Milestone
Array

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Kurt,

    You raise a good point. From the Illumina docs:

    The EAMSS algorithm, or Read Segment Quality Control Metric, identifies segments at the end of reads that have low quality and may have unreliable quality scores. EAMSS replaces quality scores for all basecalls in these segments with a Qscore of 2. We recommend disabling EAMSS through the use of this option, particularly when bcl conversion output needs to match that from other Illumina fastq-generating processes, such as MiSeq Reporter or BaseSpace fastq generation. EAMMS is no longer required with current Illumina sequencing technology and is not applied in such newer applications.

    I think we may still apply it as a legacy thing but I'm not sure -- let me ping someone who would know.

  • KurtKurt Member ✭✭✭

    Thanks @Geraldine_VdAuwera !

    Yeah, we contacted illumina support who directed us to that part of the manual...supposedly a recommendation since april of 2013, although not the default.

    This was the reply about the logic of removing EAMSS filtering from bcl2fastq2

    ">> Version 2 of bcl2fastq is based on the same pipeline as MSR and BaseSpace, so it does not use EAMMS masking. Essentially, as the quality of Illumina sequencing instruments, chemistry, and quality scoring algorithms increased over the years the reliability of the ends of reads has become greater as compared to something like the older Genome Analyzer platform. This has enabled not only longer read lengths, but also made the EAMMS option not required. Since the newer software does not support the older platforms where you might conceivably want to use the option, it is not included in this version."

  • KurtKurt Member ✭✭✭

    We've also have had it turned on the entire time. Still see some places where it appears that EAMSS is doing what I think it should be doing (GC rich regions, long homopolymer repeats), but I tend to look at old data a lot so maybe the newer models don't need it anymore...don't know. We have HiSeq 2500's and MiSeq's.

  • KurtKurt Member ✭✭✭

    Thanks Geraldine!

    We have some pre/post EAMSS filtered data that we've just put through our GATK QC pipelines to get a feel what the difference will look like. Just have to carve out some to do it ;)

    Thanks again,

    Kurt

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Cool, let us know what you find out!

  • KurtKurt Member ✭✭✭
Sign In or Register to comment.