Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

(BP1.4) Compression with ReduceReads [RETIRED]

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

This is no longer part of our Best Practices recommendations as of version 3.0. Reduced BAMs cannot be used with tools from GATK version 3.0 and above.

Very large genome sequencing datasets can be a pain to work with. Beyond the problem of storing the large amounts of data involved, processing a lot of data simultaneously (for example in multisample calling experiments that involve large cohorts) is a huge computational challenge. To mitigate this problem, we have developed a novel algorithm that allows us to compress large portions of the read data into consensus reads that retain information useful for variant calling (such as coverage depth, base and mapping quality scores etc) yet take up a smaller computational footprint. The compression process occurs in a single straightforward step that produces a new BAM file containing the reduced data. To be clear, this compression mode is NOT meant to provide a solution for long-term storage; you should always retain a copy of the unreduced data.

Using ReduceReads on your BAM files will cut down the sizes to approximately 1/100 of their original sizes, allowing the GATK to process tens of thousands of samples simultaneously without excessive I/O and processing burdens. Even for single samples ReduceReads cuts the memory requirements, I/O burden, and CPU costs of downstream tools significantly (10x or more).

image
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • michaelchaomichaelchao Posts: 13Member

    Hi,

    I have been running GATK Best Practices for exome sequencing samples. In the outputs of ReducedReads (please find attached), why many chromosomal regions are labeled with GL000 towards the end (and what are they)?

    Thank you for your help.

    INFO 21:40:12,815 ProgressMeter - 20:61595684 3.43e+07 58.6 m 102.0 s 89.6% 65.4 m 6.8 m INFO 21:40:42,816 ProgressMeter - 21:48119465 3.49e+07 59.1 m 101.0 s 91.2% 64.8 m 5.7 m INFO 21:41:12,818 ProgressMeter - 22:41747759 3.54e+07 59.6 m 101.0 s 92.6% 64.4 m 4.8 m INFO 21:41:42,819 ProgressMeter - X:38129038 3.58e+07 60.1 m 100.0 s 94.1% 63.9 m 3.8 m INFO 21:42:12,821 ProgressMeter - X:38129038 3.58e+07 60.6 m 101.0 s 94.1% 64.4 m 3.8 m INFO 21:42:42,822 ProgressMeter - X:117378064 3.63e+07 61.1 m 101.0 s 96.7% 63.2 m 2.1 m INFO 21:43:12,823 ProgressMeter - GL000233.1:14511 3.69e+07 61.6 m 100.0 s 99.8% 61.7 m 6.0 s INFO 21:43:42,824 ProgressMeter - GL000192.1:541162 3.72e+07 62.1 m 100.0 s 100.0% 62.1 m 0.0 s INFO 21:44:13,000 ProgressMeter - GL000192.1:541162 3.72e+07 62.6 m 101.0 s 100.0% 62.6 m 0.0 s INFO 21:44:43,001 ProgressMeter - GL000192.1:541162 3.72e+07 63.1 m 101.0 s 100.0% 63.1 m 0.0 s INFO 21:45:13,002 ProgressMeter - GL000192.1:541162 3.72e+07 63.6 m 102.0 s 100.0% 63.6 m 0.0 s INFO 21:45:43,003 ProgressMeter - GL000192.1:541162 3.72e+07 64.1 m 103.0 s 100.0% 64.1 m 0.0 s INFO 21:46:13,005 ProgressMeter - GL000192.1:541162 3.72e+07 64.6 m 104.0 s 100.0% 64.6 m 0.0 s INFO 21:46:43,006 ProgressMeter - GL000192.1:541162 3.72e+07 65.1 m 105.0 s 100.0% 65.1 m 0.0 s INFO 21:47:13,007 ProgressMeter - GL000192.1:541162 3.72e+07 65.6 m 105.0 s 100.0% 65.6 m 0.0 s INFO 21:47:43,008 ProgressMeter - GL000192.1:541162 3.72e+07 66.1 m 106.0 s 100.0% 66.1 m 0.0 s INFO 21:48:13,009 ProgressMeter - GL000192.1:541162 3.72e+07 66.6 m 107.0 s 100.0% 66.6 m 0.0 s INFO 21:48:43,011 ProgressMeter - GL000192.1:541162 3.72e+07 67.1 m 108.0 s 100.0% 67.1 m 0.0 s INFO 21:49:13,012 ProgressMeter - GL000192.1:541162 3.72e+07 67.6 m 109.0 s 100.0% 67.6 m 0.0 s INFO 21:49:43,014 ProgressMeter - GL000192.1:541162 3.72e+07 68.1 m 110.0 s 100.0% 68.1 m 0.0 s INFO 21:50:13,015 ProgressMeter - GL000192.1:541162 3.72e+07 68.6 m 110.0 s 100.0% 68.6 m 0.0 s INFO 21:50:43,016 ProgressMeter - GL000192.1:541162 3.72e+07 69.1 m 111.0 s 100.0% 69.1 m 0.0 s INFO 21:51:13,017 ProgressMeter - GL000192.1:541162 3.72e+07 69.6 m 112.0 s 100.0% 69.6 m 0.0 s INFO 21:51:43,019 ProgressMeter - GL000192.1:541162 3.72e+07 70.1 m 113.0 s 100.0% 70.1 m 0.0 s INFO 21:52:13,020 ProgressMeter - GL000192.1:541162 3.72e+07 70.6 m 114.0 s 100.0% 70.6 m 0.0 s INFO 21:52:43,021 ProgressMeter - GL000192.1:541162 3.72e+07 71.1 m 114.0 s 100.0% 71.1 m 0.0 s INFO 21:53:13,022 ProgressMeter - GL000192.1:541162 3.72e+07 71.6 m 115.0 s 100.0% 71.6 m 0.0 s INFO 21:53:43,024 ProgressMeter - GL000192.1:541162 3.72e+07 72.1 m 116.0 s 100.0% 72.1 m 0.0 s INFO 21:54:13,025 ProgressMeter - GL000192.1:541162 3.72e+07 72.6 m 117.0 s 100.0% 72.6 m 0.0 s INFO 21:54:43,026 ProgressMeter - GL000192.1:541162 3.72e+07 73.1 m 118.0 s 100.0% 73.1 m 0.0 s INFO 21:55:13,027 ProgressMeter - GL000192.1:541162 3.72e+07 73.6 m 118.0 s 100.0% 73.6 m 0.0 s INFO 21:55:43,029 ProgressMeter - GL000192.1:541162 3.72e+07 74.1 m 119.0 s 100.0% 74.1 m 0.0 s INFO 21:56:13,030 ProgressMeter - GL000192.1:541162 3.72e+07 74.6 m 120.0 s 100.0% 74.6 m 0.0 s INFO 21:56:43,031 ProgressMeter - GL000192.1:541162 3.72e+07 75.1 m 2.0 m 100.0% 75.1 m 0.0 s INFO 21:57:13,033 ProgressMeter - GL000192.1:541162 3.72e+07 75.6 m 2.0 m 100.0% 75.6 m 0.0 s INFO 21:57:43,037 ProgressMeter - GL000192.1:541162 3.72e+07 76.1 m 2.0 m 100.0% 76.1 m 0.0 s INFO 21:58:13,038 ProgressMeter - GL000192.1:541162 3.72e+07 76.6 m 2.1 m 100.0% 76.6 m 0.0 s INFO 21:58:43,039 ProgressMeter - GL000192.1:541162 3.72e+07 77.1 m 2.1 m 100.0% 77.1 m 0.0 s INFO 21:59:13,040 ProgressMeter - GL000192.1:541162 3.72e+07 77.6 m 2.1 m 100.0% 77.6 m 0.0 s INFO 21:59:43,042 ProgressMeter - GL000192.1:541162 3.72e+07 78.1 m 2.1 m 100.0% 78.1 m 0.0 s INFO 22:00:13,043 ProgressMeter - GL000192.1:541162 3.72e+07 78.6 m 2.1 m 100.0% 78.6 m 0.0 s INFO 22:00:43,044 ProgressMeter - GL000192.1:541162 3.72e+07 79.1 m 2.1 m 100.0% 79.1 m 0.0 s INFO 22:01:13,046 ProgressMeter - GL000192.1:541162 3.72e+07 79.6 m 2.1 m 100.0% 79.6 m 0.0 s INFO 22:01:43,047 ProgressMeter - GL000192.1:541162 3.72e+07 80.1 m 2.2 m 100.0% 80.1 m 0.0 s INFO 22:02:13,048 ProgressMeter - GL000192.1:541162 3.72e+07 80.6 m 2.2 m 100.0% 80.6 m 0.0 s INFO 22:02:43,049 ProgressMeter - GL000192.1:541162 3.72e+07 81.1 m 2.2 m 100.0% 81.1 m 0.0 s INFO 22:03:13,050 ProgressMeter - GL000192.1:541162 3.72e+07 81.6 m 2.2 m 100.0% 81.6 m 0.0 s INFO 22:03:43,052 ProgressMeter - GL000192.1:541162 3.72e+07 82.1 m 2.2 m 100.0% 82.1 m 0.0 s INFO 22:04:13,053 ProgressMeter - GL000192.1:541162 3.72e+07 82.6 m 2.2 m 100.0% 82.6 m 0.0 s INFO 22:04:43,054 ProgressMeter - GL000192.1:541162 3.72e+07 83.1 m 2.2 m 100.0% 83.1 m 0.0 s INFO 22:05:13,055 ProgressMeter - GL000192.1:541162 3.72e+07 83.6 m 2.3 m 100.0% 83.6 m 0.0 s INFO 22:05:43,056 ProgressMeter - GL000192.1:541162 3.72e+07 84.1 m 2.3 m 100.0% 84.1 m 0.0 s INFO 22:06:13,058 ProgressMeter - GL000192.1:541162 3.72e+07 84.6 m 2.3 m 100.0% 84.6 m 0.0 s INFO 22:06:43,059 ProgressMeter - GL000192.1:541162 3.72e+07 85.1 m 2.3 m 100.0% 85.1 m 0.0 s INFO 22:07:13,060 ProgressMeter - GL000192.1:541162 3.72e+07 85.6 m 2.3 m 100.0% 85.6 m 0.0 s INFO 22:07:43,062 ProgressMeter - GL000192.1:541162 3.72e+07 86.1 m 2.3 m 100.0% 86.1 m 0.0 s INFO 22:08:13,063 ProgressMeter - GL000192.1:541162 3.72e+07 86.6 m 2.3 m 100.0% 86.6 m 0.0 s INFO 22:08:43,064 ProgressMeter - GL000192.1:541162 3.72e+07 87.1 m 2.3 m 100.0% 87.1 m 0.0 s INFO 22:09:13,065 ProgressMeter - GL000192.1:541162 3.72e+07 87.6 m 2.4 m 100.0% 87.6 m 0.0 s INFO 22:09:43,067 ProgressMeter - GL000192.1:541162 3.72e+07 88.1 m 2.4 m 100.0% 88.1 m 0.0 s INFO 22:10:13,068 ProgressMeter - GL000192.1:541162 3.72e+07 88.6 m 2.4 m 100.0% 88.6 m 0.0 s

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    They are contigs that correspond to unplaced scaffolds in the human genome map. See e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3683849/ for more information.

    Geraldine Van der Auwera, PhD

This discussion has been closed.