IndelRealign and baseRecalibration on a large whole genome sample

elondinelondin Posts: 3Member

I am doing some whole genome sequence on 2 samples in which each sample was run on 12 lanes of a SOLiD 5500 machine. These are at fairly high coverage of ~40x each. My plan was align each lane independently then merge all 12 lane for each sample into 1 large bam file, then do the post-processing. I did this and was able to do the indel realign on both samples but have been having trouble with the base recalibration step, in which when apply the base recalibration PrintReads crashes with an error saying that there is not enough memory available. I have tried changing the tmp directory that is used and any other trick that I have been able to find on the forum.

I was wondering if an alternate and suitable approach would be to perform all of the post-processing on each individual lane first, then merge all of the lanes together after that. Would doing that have any adverse affect on the downstream analysis, i.e snps, cnvs, translocations, etc.



  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 9,962Administrator, Dev admin

    Hi there,

    Are you using the latest version of GATK, ie 2.5-2? In previous versions there was a bug that caused PrintReads to require excessive memory, but we've fixed that since.

    Geraldine Van der Auwera, PhD

  • elondinelondin Posts: 3Member

    No, I'm using 2.4. I'll try upgrading and see how it goes.

Sign In or Register to comment.