We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

ReduceReads running slowly

Hi there,

I'm working with two whole-genome sequencing datasets with about 40x coverage. Unfortunately, our cluster cannot run Queue, so I've had to split the files by chromosome to attain reasonable ReduceReads processing times. For one sample, ReduceReads had a runtime between .5 and 7 hours, depending on the chromsome. For the other sample, the runtime ranged from 16 to 90 hours. Six chromosomes still are not finished after 90 hrs. These two samples are very closely matched in terms of coverage, quality and source, so I'm puzzled as to what the difference could be. I should mention that the slowdown appears to be patchy; that is, tens of megabases are sometimes processed in minutes, while at other loci, ReduceReads takes hours to process even a few Kb. I've attached a snapshot of the ReduceReads output for chromosome 11 for each sample. Could you guys give me any pointers as to what can cause such a dramatic slowdown in ReduceReads either at specific loci or in general ?

Thanks, Kevin

GATK output for Sample A (the one which completed):
INFO 11:49:57,034 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:49:57,066 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14
INFO 11:49:57,066 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 11:49:57,066 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:49:57,070 HelpFormatter - Program Args: -R /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -T ReduceReads -I FP-44_dsDNA.rmDup.bam.realignedBam.chr11.recal.bam -o FP-44_dsDNA.rmDup.bam.realignedBam.chr11.recal.reducedReads.bam
INFO 11:49:57,070 HelpFormatter - Date/Time: 2013/02/13 11:49:57
INFO 11:49:57,071 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:49:57,071 HelpFormatter - --------------------------------------------------------------------------------
INFO 11:49:57,146 GenomeAnalysisEngine - Strictness is SILENT
INFO 11:49:57,243 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 11:49:57,251 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 11:49:58,604 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 1.35
INFO 11:49:59,242 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 11:49:59,243 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 11:49:59,359 ReadShardBalancer$1 - Loading BAM index data for next contig
INFO 11:49:59,362 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO 11:50:29,254 ProgressMeter - chr11:532337 5.00e+05 30.0 s 60.0 s 58.7% 51.1 s 21.1 s
INFO 11:50:59,320 ProgressMeter - chr11:1716785 1.81e+06 60.1 s 33.2 s 58.7% 102.3 s 42.2 s
INFO 11:51:29,329 ProgressMeter - chr11:2908145 3.18e+06 90.1 s 28.3 s 58.8% 2.6 m 63.2 s
INFO 11:51:59,338 ProgressMeter - chr11:4443832 4.53e+06 2.0 m 26.5 s 58.8% 3.4 m 84.1 s
INFO 11:52:29,350 ProgressMeter - chr11:6508893 6.21e+06 2.5 m 24.2 s 58.9% 4.2 m 104.9 s
INFO 11:53:00,480 ProgressMeter - chr11:8254501 7.71e+06 3.0 m 23.5 s 58.9% 5.1 m 2.1 m
INFO 11:53:30,489 ProgressMeter - chr11:8271188 7.73e+06 3.5 m 27.3 s 58.9% 6.0 m 2.5 m
INFO 11:54:00,498 ProgressMeter - chr11:8279094 7.74e+06 4.0 m 31.2 s 58.9% 6.8 m 2.8 m
INFO 11:54:30,507 ProgressMeter - chr11:8287403 7.75e+06 4.5 m 35.0 s 58.9% 7.7 m 3.2 m
INFO 11:55:00,516 ProgressMeter - chr11:8287403 7.75e+06 5.0 m 38.9 s 58.9% 8.5 m 3.5 m
INFO 11:55:30,526 ProgressMeter - chr11:9913770 9.04e+06 5.5 m 36.6 s 59.0% 9.4 m 3.8 m
INFO 11:56:00,535 ProgressMeter - chr11:11615519 1.06e+07 6.0 m 34.1 s 59.0% 10.2 m 4.2 m
INFO 11:56:30,653 ProgressMeter - chr11:13031369 1.19e+07 6.5 m 32.9 s 59.1% 11.0 m 4.5 m
INFO 11:57:02,396 ProgressMeter - chr11:15075738 1.35e+07 7.1 m 31.3 s 59.1% 11.9 m 4.9 m

GATK output for Problematic Sample : chromosome 11:
INFO 15:18:54,799 HelpFormatter - --------------------------------------------------------------------------------
INFO 15:18:54,856 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14
INFO 15:18:54,856 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 15:18:54,856 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 15:18:54,860 HelpFormatter - Program Args: -R /fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa -T ReduceReads -I FP-43_dsDNA.rmDup.bam.realignedBam.chr11.recal.bam -o FP-43_dsDNA.rmDup.bam.realignedBam.chr11.recal.reducedReads.bam
INFO 15:18:54,861 HelpFormatter - Date/Time: 2013/02/16 15:18:54
INFO 15:18:54,861 HelpFormatter - --------------------------------------------------------------------------------
INFO 15:18:54,861 HelpFormatter - --------------------------------------------------------------------------------
INFO 15:18:54,931 GenomeAnalysisEngine - Strictness is SILENT
INFO 15:18:55,045 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 15:18:55,051 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 15:18:55,184 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.13
INFO 15:18:55,238 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 15:18:55,238 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 15:18:55,343 ReadShardBalancer$1 - Loading BAM index data for next contig
INFO 15:18:55,345 ReadShardBalancer$1 - Done loading BAM index data for next contig
INFO 15:19:25,250 ProgressMeter - chr11:557669 5.60e+05 30.0 s 53.6 s 58.7% 51.1 s 21.1 s
INFO 15:19:55,262 ProgressMeter - chr11:573437 5.80e+05 60.0 s 103.5 s 58.7% 102.3 s 42.3 s
INFO 15:20:25,272 ProgressMeter - chr11:573437 5.80e+05 90.0 s 2.6 m 58.7% 2.6 m 63.4 s
INFO 15:20:55,282 ProgressMeter - chr11:582364 5.90e+05 2.0 m 3.4 m 58.7% 3.4 m 84.5 s
INFO 15:21:25,291 ProgressMeter - chr11:582364 5.90e+05 2.5 m 4.2 m 58.7% 4.3 m 105.7 s
INFO 15:21:55,302 ProgressMeter - chr11:588059 6.00e+05 3.0 m 5.0 m 58.7% 5.1 m 2.1 m
INFO 15:22:25,311 ProgressMeter - chr11:588059 6.00e+05 3.5 m 5.8 m 58.7% 6.0 m 2.5 m
INFO 15:22:55,322 ProgressMeter - chr11:596521 6.10e+05 4.0 m 6.6 m 58.7% 6.8 m 2.8 m
INFO 15:23:25,331 ProgressMeter - chr11:596521 6.10e+05 4.5 m 7.4 m 58.7% 7.7 m 3.2 m
INFO 15:23:55,341 ProgressMeter - chr11:596521 6.10e+05 5.0 m 8.2 m 58.7% 8.5 m 3.5 m
INFO 15:24:25,352 ProgressMeter - chr11:596521 6.10e+05 5.5 m 9.0 m 58.7% 9.4 m 3.9 m
INFO 15:24:55,362 ProgressMeter - chr11:605317 6.20e+05 6.0 m 9.7 m 58.7% 10.2 m 4.2 m
INFO 15:25:25,373 ProgressMeter - chr11:605317 6.20e+05 6.5 m 10.5 m 58.7% 11.1 m 4.6 m
INFO 15:25:55,382 ProgressMeter - chr11:605317 6.20e+05 7.0 m 11.3 m 58.7% 11.9 m 4.9 m

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Kevin,

    The most likely reason for the irregular run time is that in areas that are not complex, RR has very little to do, so runtime is very fast. But when it hits complex regions, it has a lot more to evaluate, and that slows things down considerably. So that explains the patchiness between regions. I'm a little more surprised at the difference you're seeing for the same contig between different samples, if the samples really are comparable in quality and coverage. You may want to look at the areas that were particularly slow, to see if there's any local differences in complexity.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Also, consider using the -dcov downsampling option to speed things up in regions of high coverage.

Sign In or Register to comment.