The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.4 has MAJOR CHANGES that impact throughput of pipelines. Default compression is now 1 instead of 5, and Picard now handles compressed data with the Intel Deflator/Inflator instead of JDK.
GATK version 4.beta.3 (i.e. the third beta release) is out. See the github release page for download and details.


Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited January 2013 in GenomeSTRiP Documentation

1. Introduction

The ComputeGenomeMask utility determines the alignability of each base in the
reference genome.

Mask files are generated based on a fixed read length L. A base is considered
alignable if a window of length L centered on the base is unique within the
reference sequence.

The ComputeGenomeMask utility works by dividing the input genome into
sequences of length L for every base position and then aligning these
sequences against the reference genome (using BWA) and looking for unique

The implementation of genome masking in Genome STRiP is likely to change in
the future.

2. Inputs / Arguments

  • -R <fasta-file> : Reference sequence. : An indexed fasta file containing
    the reference sequence to mask. : The fasta file must have been indexed using
    BWA in preparation for BWA alignment. The fasta file must also be indexed
    with 'samtools faidx' or the equivalent.

  • -readLength <length> : The size of the window to use for determining

  • -sequence <sequenceName> : The name of the sequence to process (default is
    to process the entire genome). This can be used to parallelize the
    computation by chromosome.

3. Outputs

  • -O <mask-file> : Output genome mask file (fasta format). Default is to
    write to stdout. The mask file contains '0' at alignable positions and '1'
    at non-alignable positions.

4. Running

ComputeGenomeMask is part of Genome STRiP, which is a third-party GATK library.

An example invocation is shown below:


java -Xmx2g -cp SVToolkit.jar:GenomeAnalysisTK.jar \ \ 
    -R Homo_sapiens_assembly18.fasta \ 
    -O Homo_sapiens_assembly18.mask.chr1.36.fasta \ 
    -readLength 36 \ 
    -sequence chr1 

To generate a mask for the entire genome, it is generally preferably to
compute the mask for each chromosome separately and then concatenate the
output files in the correct order.

The following script provides an example of generating the appropiate indexes
and running ComputeGenomeMask in parallel on each chromosome. This is an
example and not a general purpose script; it will likely need to be modified
for your environment. After each chromosome has been run, the resulting files
must be concatenated together in the same order as the reference sequence to
create the final mask.


outdir=example readLength=75 reference=Homo_sapiens_hg19.fasta
export SV_DIR=/humgen/cnp04/bobh/svtoolkit/stable

#These executables must be on your path.
which java > /dev/null || exit 1
which bwa > /dev/null || exit 1
#The directory containing must be on your LD_LIBRARY_PATH

mkdir -p ${outdir}/work

localReference=${outdir}/work/`echo ${reference} | awk -F / '{ print $NF }'`
if [ ! -e ${localReference} ]; then ln ${reference} ${localReference} || exit 1 fi

java -cp ${classpath} -Xmx4g \ \ 
    -I ${localReference} \ 
    -O ${localReference}.fai \
    || exit 1

bwa index -a bwtsw ${localReference} || exit 1

chroms=`cat ${localReference}.fai | cut -f 1`
for chr in ${chroms}; do
    bsub -o ${outdir}/work/svmask_${chr}.log \ 
    -R "rusage[mem=5000]" \ 
    java -cp ${classpath} -Xmx4g \ \ 
        -R ${localReference} \ 
        -O ${outdir}/work/svmask_${chr}.fasta \ 
        -readLength ${readLength} \ 
        -sequence ${chr} \
        || exit 1
Post edited by bhandsaker on
Sign In or Register to comment.