sample prepared from multiple flowcell and multiple lanes

wungjaeleewungjaelee Northwestern UniversityMember


I have multiple paired-end fastqs from a single biological sample that was prepared by three flowcells, two lanes each.
In short, I have

sample_flowcell1_lane1.R1.fastq.gz sample_flowcell1_lane1.R2.fastq.gz
sample_flowcell1_lane2.R1.fastq.gz sample_flowcell1_lane2.R2.fastq.gz
sample_flowcell2_lane1.R1.fastq.gz sample_flowcell2_lane1.R2.fastq.gz
sample_flowcell2_lane2.R1.fastq.gz sample_flowcell2_lane2.R2.fastq.gz
sample_flowcell3_lane1.R1.fastq.gz sample_flowcell3_lane1.R2.fastq.gz
sample_flowcell3_lane2.R1.fastq.gz sample_flowcell3_lane2.R2.fastq.gz

where R1, R2 are paired-end reads.

I'm trying to generate a single bam file from these fastqs with bwa mem and samtools on reference GRCh37
Then ultimately run whole exome sequencing with following procedure.
bwa_mem for each 6 sets of paired-end reads
samtools sort for each 6 generated bams
samtools merge -r for the 6 generated bams to produce a single bam

Then start the GATK process on the merged.bam
picard.jar AddOrReplaceGroups
picard.jar MarkDuplicates
picard.jar ReorderSam
GATK RealignerTargetCreator
GATK IndelRealigner
GATK Baserecalibrator
and so on ...

I am not sure what is the best way to merge these fastqs and generate a single bam.
Could you recommend me how I should generate a single bam from these fastqs?


Sign In or Register to comment.