Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SVAltAlign Queue script

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited September 2012 in GenomeSTRiP Documentation

1. Introduction

SVAltAlign.q is a sample Queue script that is part of Genome STRiP.

This script realigned previously unmapped reads against putative alternate
alleles generated from a VCF file describing a set of variants to be
genotypes. The output is a merged bam file that contains these alignements to
the alternate alleles. These alterante allele alignments are then used as
input to genotyping.

2. Inputs / Arguments

  • -vcf <input-vcf-file> : A VCF file containing descriptions of the
    structural variations. : Only records for structural variations with precise
    breakpoints will be processed.

  • -I <bam-file> : The set of input BAM files containing records to realign.

  • -md <directory> : The metadata directory containing metadata about the
    input data set.

  • -R <fasta-file> : Reference sequence. : An indexed fasta file containing
    the reference sequence. The fasta file must be indexed with samtools faidx
    or the equivalent.

  • -altAlleleFlankLength <n> : The length of flanking sequence from the
    reference genome used during realignment (default 200).

  • -alignUnmappedMates <boolean> : Whether to align unmapped mates of mapped
    reads to the alternate alleles (default true). : If false, then unmapped reads
    with a POS field will not be ignored.

  • -configFile <configuration-file> : This file contains values for
    specialized settings that do not normally need to be changed. : A default
    configuration file is provided in conf/genstrip_parameters.txt.

3. Outputs

  • -O <bam-file> : The default output for this pipeline is a single merged
    bam file for all input bam files and all alternate alleles. : The sequence
    identifier for an alternate allele is VariantID_N where N is the index of the
    alternate allele in the VCF file (i.e. the first alternate allele is allele

4. Running

The SVAltAlign.q script is run through Queue.

Because Genome STRiP is a third-party GATK library, the Queue command line
must be invoked explicitly, as shown in the example below.

java -Xmx2g -cp Queue.jar:SVToolkit.jar:GenomeAnalysisTK.jar \
    org.broadinstitute.sting.queue.QCommandLine \ 
    -S SVAltAlign.q \ 
    -S SVQScript.q \ 
    -gatk GenomeAnalysisTK.jar \ 
    -cp SVToolkit.jar:GenomeAnalysisTK.jar \
    -configFile /path/to/svtoolkit/conf/genstrip_parameters.txt \ 
    -tempDir /path/to/tmp/dir \ 
    -md metadata \ 
    -R Homo_sapiens_assembly18.fasta \ 
    -vcf input.vcf \ 
    -I input1.bam -I input2.bam \ 
    -O output.bam \ 
    -run \ 
    -bsub \
    -jobQueue gsa \ 
    -jobProject 1KG \ 
    -jobLogDir logs 

5. Typical Queue Arguments

Queue typically requires the following arguments to run Genome STRiP

  • -run : Actually run the pipeline (default is to do a dry run).

  • -S <queue-script> : Script to run. : The base script SVQScript.q from the
    SVToolkit should also be specified with a separate -S argument.

  • -gatk <jar-file> : The path to the GATK jar file.

  • -cp <classpath> : The java classpath to use for pipeline commands. This
    must include SVToolkit.jar and GenomeAnalysisTK.jar. : Note: Both -cp
    arguments are required in the example command. The first -cp argument is for
    the invocation of Queue itself, the second -cp argument is for the invocation
    of pipeline processes that will be run by Queue.

  • -tempDir <directory> : Path to a directory to use for temporary files.

6. Queue LSF Arguments

  • -bsub : Use LSF to submit jobs.

  • -jobQueue <queue-name> : LSF queue to use.

  • -jobProject <project-name> : LSF project to use for accounting.

  • -jobLogDir <directory> : Directory for LSF log files.

Post edited by Geraldine_VdAuwera on
Sign In or Register to comment.