This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Performance of ReduceReads
I have a set of 250 BAM files with whole-genome sequence of trios (3 samples per BAM) at 12x coverage. Each BAM file is currently about 200GB-250GB.
I am currently trying to process a number of BAMs with ReduceReads prior to using them with the Unified Genotyper and was wondering whether the run times I'm getting are in line with what I should expect. If they are, I was wondering if you think the time spent on ReduceReads should be then gain when calling using the UG.
I've tried running ReduceReads in 2 different ways:
Per trio: Each BAM is input/output as a whole. This gives estimated run times of ~5days.
java -Xmx4g -Djava.io.tmpdir=/local -jar ~/tools/GenomeAnalysisTK-2.1-8-g5efb575/GenomeAnalysisTK.jar \ -T ReduceReads \ -R human_g1k_v37.fa \ -I A4.human_g1k_v37.trio_realigned.bam \ -o A4.human_g1k_v37.trio_realigned.reduced.bam
Per individual: Each BAM is input with option -rgbl for all read groups not belonging to the individual. This way I would run 3 ReduceReads processes on each trio-BAM. Each of these gives an estimated run time of 2days.
java -Xmx4g -Djava.io.tmpdir=/local -jar ~/tools/GenomeAnalysisTK-2.1-8-g5efb575/GenomeAnalysisTK.jar \ -T ReduceReads \ -R human_g1k_v37.fa \ -I A4.human_g1k_v37.trio_realigned.bam \ -o A4a.human_g1k_v37.trio_realigned.reduced.bam \ -rgbl ID:L3 \ -rgbl ID:L5 \ -rgbl ID:L6.1 \ -rgbl ID:L6.2 \ -rgbl ID:L7.1 \ -rgbl ID:L7.2 \ -rgbl ID:L7.3 \ -rgbl ID:L7.4
Thanks a lot!