The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!
Performance of ReduceReads
I have a set of 250 BAM files with whole-genome sequence of trios (3 samples per BAM) at 12x coverage. Each BAM file is currently about 200GB-250GB.
I am currently trying to process a number of BAMs with ReduceReads prior to using them with the Unified Genotyper and was wondering whether the run times I'm getting are in line with what I should expect. If they are, I was wondering if you think the time spent on ReduceReads should be then gain when calling using the UG.
I've tried running ReduceReads in 2 different ways:
Per trio: Each BAM is input/output as a whole. This gives estimated run times of ~5days.
java -Xmx4g -Djava.io.tmpdir=/local -jar ~/tools/GenomeAnalysisTK-2.1-8-g5efb575/GenomeAnalysisTK.jar \ -T ReduceReads \ -R human_g1k_v37.fa \ -I A4.human_g1k_v37.trio_realigned.bam \ -o A4.human_g1k_v37.trio_realigned.reduced.bam
Per individual: Each BAM is input with option -rgbl for all read groups not belonging to the individual. This way I would run 3 ReduceReads processes on each trio-BAM. Each of these gives an estimated run time of 2days.
java -Xmx4g -Djava.io.tmpdir=/local -jar ~/tools/GenomeAnalysisTK-2.1-8-g5efb575/GenomeAnalysisTK.jar \ -T ReduceReads \ -R human_g1k_v37.fa \ -I A4.human_g1k_v37.trio_realigned.bam \ -o A4a.human_g1k_v37.trio_realigned.reduced.bam \ -rgbl ID:L3 \ -rgbl ID:L5 \ -rgbl ID:L6.1 \ -rgbl ID:L6.2 \ -rgbl ID:L7.1 \ -rgbl ID:L7.2 \ -rgbl ID:L7.3 \ -rgbl ID:L7.4
Thanks a lot!