Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
IndelRealign and baseRecalibration on a large whole genome sample
I am doing some whole genome sequence on 2 samples in which each sample was run on 12 lanes of a SOLiD 5500 machine. These are at fairly high coverage of ~40x each. My plan was align each lane independently then merge all 12 lane for each sample into 1 large bam file, then do the post-processing. I did this and was able to do the indel realign on both samples but have been having trouble with the base recalibration step, in which when apply the base recalibration PrintReads crashes with an error saying that there is not enough memory available. I have tried changing the tmp directory that is used and any other trick that I have been able to find on the forum.
I was wondering if an alternate and suitable approach would be to perform all of the post-processing on each individual lane first, then merge all of the lanes together after that. Would doing that have any adverse affect on the downstream analysis, i.e snps, cnvs, translocations, etc.