If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
NextSeq 500 Lanes and BaseRecalibrator
I am working with RNA-seq data from the NextSeq500: 144 samples, with 24 samples multiplexed into an equi-molar pool for each of the 6 runs.
The NextSeq flowcell consists of 4 lanes that are supplied from from a single reservoir, so the same pool must be sequenced on all 4 lanes. On other platforms such as the HiSeq, the 8 lanes have to physically be loaded separately even if they are sequencing the same pool.
My understanding is that BaseRecalibrator should be run for each lane of data, which I can specify using the PU read groups tag. My sequencing provider de-multiplexed the raw FASTQ reads by sample, but not by lane, so I currently have 144 BAM files, one BAM file per sample, which contain mapped reads (using STAR) for that sample sourced from all 4 lanes of the run. In order to assign different PU read group tags to reads from different lanes using Picard, I would need to split either the BAM file or the raw FASTQ files by lane, then process the 4 files separately.
Should I be treating each of these 4 NextSeq lanes separately like HiSeq lanes, or can I run BaseRecalibrator on all 4 lanes together since the lanes are supplied from one reservoir?