SVAltAlign Queue script

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 8,019Administrator, GATK Dev admin
edited September 2012 in GenomeSTRiP Documentation

1. Introduction

SVAltAlign.q is a sample Queue script that is part of Genome STRiP.

This script realigned previously unmapped reads against putative alternate
alleles generated from a VCF file describing a set of variants to be
genotypes. The output is a merged bam file that contains these alignements to
the alternate alleles. These alterante allele alignments are then used as
input to genotyping.

2. Inputs / Arguments

  • -vcf <input-vcf-file> : A VCF file containing descriptions of the
    structural variations. : Only records for structural variations with precise
    breakpoints will be processed.

  • -I <bam-file> : The set of input BAM files containing records to realign.

  • -md <directory> : The metadata directory containing metadata about the
    input data set.

  • -R <fasta-file> : Reference sequence. : An indexed fasta file containing
    the reference sequence. The fasta file must be indexed with samtools faidx
    or the equivalent.

  • -altAlleleFlankLength <n> : The length of flanking sequence from the
    reference genome used during realignment (default 200).

  • -alignUnmappedMates <boolean> : Whether to align unmapped mates of mapped
    reads to the alternate alleles (default true). : If false, then unmapped reads
    with a POS field will not be ignored.

  • -configFile <configuration-file> : This file contains values for
    specialized settings that do not normally need to be changed. : A default
    configuration file is provided in conf/genstrip_parameters.txt.

3. Outputs

  • -O <bam-file> : The default output for this pipeline is a single merged
    bam file for all input bam files and all alternate alleles. : The sequence
    identifier for an alternate allele is VariantID_N where N is the index of the
    alternate allele in the VCF file (i.e. the first alternate allele is allele
    1).

4. Running

The SVAltAlign.q script is run through Queue.

Because Genome STRiP is a third-party GATK library, the Queue command line
must be invoked explicitly, as shown in the example below.

java -Xmx2g -cp Queue.jar:SVToolkit.jar:GenomeAnalysisTK.jar \
    org.broadinstitute.sting.queue.QCommandLine \ 
    -S SVAltAlign.q \ 
    -S SVQScript.q \ 
    -gatk GenomeAnalysisTK.jar \ 
    -cp SVToolkit.jar:GenomeAnalysisTK.jar \
    -configFile /path/to/svtoolkit/conf/genstrip_parameters.txt \ 
    -tempDir /path/to/tmp/dir \ 
    -md metadata \ 
    -R Homo_sapiens_assembly18.fasta \ 
    -vcf input.vcf \ 
    -I input1.bam -I input2.bam \ 
    -O output.bam \ 
    -run \ 
    -bsub \
    -jobQueue gsa \ 
    -jobProject 1KG \ 
    -jobLogDir logs 

5. Typical Queue Arguments

Queue typically requires the following arguments to run Genome STRiP
pipelines.

  • -run : Actually run the pipeline (default is to do a dry run).

  • -S <queue-script> : Script to run. : The base script SVQScript.q from the
    SVToolkit should also be specified with a separate -S argument.

  • -gatk <jar-file> : The path to the GATK jar file.

  • -cp <classpath> : The java classpath to use for pipeline commands. This
    must include SVToolkit.jar and GenomeAnalysisTK.jar. : Note: Both -cp
    arguments are required in the example command. The first -cp argument is for
    the invocation of Queue itself, the second -cp argument is for the invocation
    of pipeline processes that will be run by Queue.

  • -tempDir <directory> : Path to a directory to use for temporary files.

6. Queue LSF Arguments

  • -bsub : Use LSF to submit jobs.

  • -jobQueue <queue-name> : LSF queue to use.

  • -jobProject <project-name> : LSF project to use for accounting.

  • -jobLogDir <directory> : Directory for LSF log files.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Sign In or Register to comment.