Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

IndelRealign and baseRecalibration on a large whole genome sample

elondinelondin Posts: 3Member

Hi, I am doing some whole genome sequence on 2 samples in which each sample was run on 12 lanes of a SOLiD 5500 machine. These are at fairly high coverage of ~40x each. My plan was align each lane independently then merge all 12 lane for each sample into 1 large bam file, then do the post-processing. I did this and was able to do the indel realign on both samples but have been having trouble with the base recalibration step, in which when apply the base recalibration PrintReads crashes with an error saying that there is not enough memory available. I have tried changing the tmp directory that is used and any other trick that I have been able to find on the forum.

I was wondering if an alternate and suitable approach would be to perform all of the post-processing on each individual lane first, then merge all of the lanes together after that. Would doing that have any adverse affect on the downstream analysis, i.e snps, cnvs, translocations, etc.

Thanks

Answers

Sign In or Register to comment.