NextSeq 500 Lanes and BaseRecalibrator
I am working with RNA-seq data from the NextSeq500: 144 samples, with 24 samples multiplexed into an equi-molar pool for each of the 6 runs.
The NextSeq flowcell consists of 4 lanes that are supplied from from a single reservoir, so the same pool must be sequenced on all 4 lanes. On other platforms such as the HiSeq, the 8 lanes have to physically be loaded separately even if they are sequencing the same pool.
My understanding is that BaseRecalibrator should be run for each lane of data, which I can specify using the PU read groups tag. My sequencing provider de-multiplexed the raw FASTQ reads by sample, but not by lane, so I currently have 144 BAM files, one BAM file per sample, which contain mapped reads (using STAR) for that sample sourced from all 4 lanes of the run. In order to assign different PU read group tags to reads from different lanes using Picard, I would need to split either the BAM file or the raw FASTQ files by lane, then process the 4 files separately.
Should I be treating each of these 4 NextSeq lanes separately like HiSeq lanes, or can I run BaseRecalibrator on all 4 lanes together since the lanes are supplied from one reservoir?