Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

(howto) Compress read data with ReduceReads

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin
edited June 2013 in Tutorials

Objective

Compress the read data in order to minimize file sizes, which facilitates massively multisample processing.

Prerequisites

  • TBD

Steps

  1. Compress your sequence data

1. Compress your sequence data

Action

Run the following GATK command:

java -jar GenomeAnalysisTK.jar \ 
    -T ReduceReads \ 
    -R reference.fa \ 
    -I recal_reads.bam \ 
    -L 20 \ 
    -o reduced_reads.bam 

Expected Result

This creates a file called reduced_reads.bam containing only the sequence information that is essential for calling variants.

Note that ReduceReads is not meant to be run on multiple samples at once. If you plan on merging your sample bam files, you should run ReduceReads on individual samples before doing so.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

Sign In or Register to comment.