Attention: Want an end-to-end pipelining solution for GATK Best Practices?
sample prepared from multiple flowcell and multiple lanes
I have multiple paired-end fastqs from a single biological sample that was prepared by three flowcells, two lanes each.
In short, I have
where R1, R2 are paired-end reads.
I'm trying to generate a single bam file from these fastqs with bwa mem and samtools on reference GRCh37
Then ultimately run whole exome sequencing with following procedure.
bwa_mem for each 6 sets of paired-end reads
samtools sort for each 6 generated bams
samtools merge -r for the 6 generated bams to produce a single bam
Then start the GATK process on the merged.bam
and so on ...
I am not sure what is the best way to merge these fastqs and generate a single bam.
Could you recommend me how I should generate a single bam from these fastqs?