Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
SVAltAlign Queue script
SVAltAlign.q is a sample Queue script that is part of Genome STRiP.
This script realigned previously unmapped reads against putative alternate
alleles generated from a VCF file describing a set of variants to be
genotypes. The output is a merged bam file that contains these alignements to
the alternate alleles. These alterante allele alignments are then used as
input to genotyping.
2. Inputs / Arguments
-vcf <input-vcf-file>: A VCF file containing descriptions of the
structural variations. : Only records for structural variations with precise
breakpoints will be processed.
-I <bam-file>: The set of input BAM files containing records to realign.
-md <directory>: The metadata directory containing metadata about the
input data set.
-R <fasta-file>: Reference sequence. : An indexed fasta file containing
the reference sequence. The fasta file must be indexed with
or the equivalent.
-altAlleleFlankLength <n>: The length of flanking sequence from the
reference genome used during realignment (default 200).
-alignUnmappedMates <boolean>: Whether to align unmapped mates of mapped
reads to the alternate alleles (default true). : If false, then unmapped reads
with a POS field will not be ignored.
-configFile <configuration-file>: This file contains values for
specialized settings that do not normally need to be changed. : A default
configuration file is provided in conf/genstrip_parameters.txt.
-O <bam-file>: The default output for this pipeline is a single merged
bam file for all input bam files and all alternate alleles. : The sequence
identifier for an alternate allele is
Nis the index of the
alternate allele in the VCF file (i.e. the first alternate allele is allele
SVAltAlign.q script is run through Queue.
Because Genome STRiP is a third-party GATK library, the Queue command line
must be invoked explicitly, as shown in the example below.
java -Xmx2g -cp Queue.jar:SVToolkit.jar:GenomeAnalysisTK.jar \ org.broadinstitute.sting.queue.QCommandLine \ -S SVAltAlign.q \ -S SVQScript.q \ -gatk GenomeAnalysisTK.jar \ -cp SVToolkit.jar:GenomeAnalysisTK.jar \ -configFile /path/to/svtoolkit/conf/genstrip_parameters.txt \ -tempDir /path/to/tmp/dir \ -md metadata \ -R Homo_sapiens_assembly18.fasta \ -vcf input.vcf \ -I input1.bam -I input2.bam \ -O output.bam \ -run \ -bsub \ -jobQueue gsa \ -jobProject 1KG \ -jobLogDir logs
5. Typical Queue Arguments
Queue typically requires the following arguments to run Genome STRiP
-run: Actually run the pipeline (default is to do a dry run).
-S <queue-script>: Script to run. : The base script SVQScript.q from the
SVToolkit should also be specified with a separate -S argument.
-gatk <jar-file>: The path to the GATK jar file.
-cp <classpath>: The java classpath to use for pipeline commands. This
must include SVToolkit.jar and GenomeAnalysisTK.jar. : Note: Both -cp
arguments are required in the example command. The first -cp argument is for
the invocation of Queue itself, the second -cp argument is for the invocation
of pipeline processes that will be run by Queue.
-tempDir <directory>: Path to a directory to use for temporary files.
6. Queue LSF Arguments
-bsub: Use LSF to submit jobs.
-jobQueue <queue-name>: LSF queue to use.
-jobProject <project-name>: LSF project to use for accounting.
-jobLogDir <directory>: Directory for LSF log files.