Question Regarding GATK Doc #6483
I just happen to see a strange issue with this document.
In my own practice I always remove adapters at the very beginning (demultiplexing stage) and continue my analyses from fastq to uBAM to mapping and so on.
However recently I received some external data for analysis and realized that there is about 9 percent adapter contamination in fastqs. Looks like adapter cleanup is omitted in the demultiplexing stage.
As a personal preference I am against removing anything from fastq after demultiplexing stage and I am totally against trimmers since they tend to mess up with the order of reads and further complicate debugging of already established pipelines in production.
So I decided to give a try to MarkIlluminaAdapters option since that gives me the option to mark them and rescore them with QV2 therefore they won't interfere with my analyses. Looking at the document #6483 after marking illumina adapters step uBAM is streamed to BWA then streamed to MergeBamAlignment to create a clean bam however those marked adapters with QV2 are totally reverted to their original quality values (QV >30 for most!!!!) at that stage. So I am concerned about this.
Can anyone comment on that from GATK team why do we mark them if the original qualities will be restored anyway?
Am I missing something?
My current practice is to mark the adapters and convert uBAMxt to FASTQ with CLIPPING OPTION 2 and start mapping with this fastq and also generate a second uBAM with these new fastqs that contain the adapter sequences with QV2.
Am I understanding wrong?
Thanks for the help.