Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

(HOWTO) Run GATK CNV (and GATK ACNV) using premade Queue scripts (Broad Internal)

LeeTL1220LeeTL1220 Arlington, MAMember, Broadie, Dev ✭✭✭
edited December 2017 in GATK 4 Beta

Notice: these workflows are out of date and should no longer be used. See the WDL scripts instead.


These workflows are not officially supported, but are used extensively, internally, for our evaluations.

These instructions assume some familiarity with running Queue.

Download Queue scala scripts

Download the following files to the same directory. This step only needs to be done once.

Please note: These locations are temporary and will be updated in the future.

Test that invoking the help documentation works

java -jar recapseg-hb-eval.jar -S CreatePoNPipeline.scala --help
java -jar recapseg-hb-eval.jar -S CaseSampleHBExomePipeline.scala --help

Example case sample run with ACNV

Parameters to the recapseg-hb-eval.jar may need to be adjusted depending on your execution environment

#!/bin/bash -l
. /broad/tools/scripts/useuse
use .hdfview-2.9
use Java-1.8
use .r-3.1.3-gatk-only

### Modify parameters below this line

# Two simple text files listing each bam file on a separate line
#   Note that the lines in each file must correspond to case-control pairs.
CASE_SAMPLES=tumor_bam_files.txt
CONTROL_SAMPLES=normal_bam_files.txt

# Output location (absolute paths recommended).  
#  Seg file will appear in $OUTDIR/tumor_pcov
#  Calls will appear in $OUTDIR/caller
OUTDIR=/home/username/evals/out_case/

# Downloaded gatk-protected.jar
GATK4PJAR=gatk-protected.jar

# Directory of HDF5 JNI shared libraries (.so/.dynlib)
HDFLOC=/broad/software/free/Linux/redhat_6_x86_64/pkgs/hdfview_2.9/HDFView/lib/linux/

# Reference (This is b37 at the Broad Institute)
REF=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta

# Target list.  Must be the same as was used to create the PoN.
TARGETS=/home/lichtens/my_target_list.bed

# PoN file must be created with the same target file (above) and settings.
PON=/home/lichtens/evals/out_pon/create_pon/my_blood_normals.pon

# How much memory to allocate to each job, in GB
MEM=8

# List of SNPs in the interval_list format
SNP_LIST=my_list_of_common_het_sites.interval_list

# See --help for descriptions of these parameters.  
#  If `-keepDups` was specified for the PoN, it must be specified for case samples as well.
#  If `-pd 250` was specified for the PoN, it must be specified for case samples as well.
# -rawcov will generate a separate file for the raw counts as a separate process.
PD=250
OTHER_OPTS=" -jobResReq virtual_free=${MEM}G -keepDups -rawcov -jobQueue gsa -pd ${PD} -noWt "
OTHER_OPTS_ACNV=" -acnv  -snp ${SNP_LIST}  -icontrol ${CONTROL_SAMPLES} "

#### Do not modify below this line

# Run the sample(s)
java -jar recapseg-hb-eval.jar -S CaseSampleHBExomePipeline.scala -mem ${MEM} -pon ${PON} -i ${CASE_SAMPLES} -o ${OUTDIR} -gatk4pjar ${HBJAR} -r ${REF} -L ${TARGETS} -qsub -run -logDir ${OUTDIR} -hvl ${HDFLOC} -Z ${Z} ${OTHER_OPTS} ${OTHER_OPTS_ACNV}

Example case sample run without ACNV

Parameters to the recapseg-hb-eval.jar may need to be adjusted depending on your execution environment

#!/bin/bash -l
. /broad/tools/scripts/useuse
use .hdfview-2.9
use Java-1.8
use .r-3.1.3-gatk-only

### Modify parameters below this line

# A simple text file listing each bam file on a separate line
INPUT_BAMS=case_bam_list.txt

# Output location (absolute paths recommended).  
#  Seg file will appear in $OUTDIR/tumor_pcov
#  Calls will appear in $OUTDIR/caller
OUTDIR=/home/username/evals/out_case/

# Downloaded gatk-protected.jar
GATK4PJAR=gatk-protected.jar

# Directory of HDF5 JNI shared libraries (.so/.dynlib)
HDFLOC=/broad/software/free/Linux/redhat_6_x86_64/pkgs/hdfview_2.9/HDFView/lib/linux/

# Reference (This is b37 at the Broad Institute)
REF=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta

# Target list.  Must be the same as was used to create the PoN.
TARGETS=/home/lichtens/my_target_list.bed

# PoN file must be created with the same target file (above) and settings.
PON=/home/lichtens/evals/out_pon/create_pon/my_blood_normals.pon

# How much memory to allocate to each job, in GB
MEM=8

# See --help for descriptions of these parameters.  
#  If `-keepDups` was specified for the PoN, it must be specified for case samples as well.
#  If `-pd 250` was specified for the PoN, it must be specified for case samples as well.
# -rawcov will generate a separate file for the raw counts as a separate process.
PD=250
OTHER_OPTS=" -jobResReq virtual_free=${MEM}G -keepDups -rawcov -jobQueue gsa -pd ${PD} -noWt "

#### Do not modify below this line

# Run the sample(s)
java -jar recapseg-hb-eval.jar -S CaseSampleHBExomePipeline.scala -mem ${MEM} -pon ${PON} -i ${INPUT_BAMS} -o ${OUTDIR} -hbJar ${GATK4PJAR} -r ${REF} -L ${TARGETS} -qsub -run -logDir ${OUTDIR} -hvl ${HDFLOC} ${OTHER_OPTS}

Example create PoN run

Parameters to the Queue jar (recapseg-hb-eval.jar) may need to be adjusted depending on your execution environment

#!/bin/bash -l
. /broad/tools/scripts/useuse
use .hdfview-2.9
use Java-1.8
use .r-3.1.3-gatk-only

### Modify parameters below this line

# A simple text file listing each bam file on a separate line
INPUT_BAMS=blood_normals_bam_list.txt

# Output location (absolute paths recommended).  
#  PoN file will appear as $OUTDIR/create_pon/$PON_FILENAME
OUTDIR=/home/lichtens/evals/out_pon/

# Downloaded gatk-protected.jar
GATK4PJAR=gatk-protected.jar

# Directory of HDF5 JNI shared libraries (.so/.dynlib)
HDFLOC=/broad/software/free/Linux/redhat_6_x86_64/pkgs/hdfview_2.9/HDFView/lib/linux/

# Reference (This is b37 at the Broad Institute)
REF=/seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta

# Base PoN filename
#  This parameter is only the base filename.  Do not specify a directory.  This file will appear as
#  $OUTDIR/create_pon/$PON_FILENAME
PON_FILENAME=my_blood_normals.pon

# Target list that was used in the capture process.
TARGETS=/home/lichtens/my_target_list.bed

# How much memory to allocate to each coverage job, in GB.  
# 208k targets needs approx. 6GB RAM
MEM=6

#  Larger PoNs can require a lot of RAM (208k targets x 300 samples needs approx. 14GB RAM)
MEM_PON=14

# Number of cores to use when multicore functionality is available.
CORES=4

# See --help for descriptions of these parameters.  
#  If `-keepDups` was specified for the PoN, it must be specified for case samples as well.
#  If `-pd 250` was specified for the PoN, it must be specified for case samples as well.
# `-rawcov` will generate a separate file for the raw counts as a separate process.
OTHER_OPTS=" -jobResReq 'virtual_free=${MEM}G' -keepDups -rawcov -jobQueue gsa -pd 250 -sparkMaster 'local[${CORES}]' "

#### Do not modify below this line

# Run the sample(s)
java -jar recapseg-hb-eval.jar -S CreatePoNPipeline.scala -pon ${PON} -i ${IN} -o ${OUTDIR} -gatk4pjar ${HBJAR} -r ${REF} -L ${TARGETS} -qsub -run -logDir ${OUTDIR} -hvl ${HDFLOC} -mem ${MEM} -mem_pon ${MEM_PON}  ${OTHER_OPTS}

Post edited by Geraldine_VdAuwera on

Comments

Sign In or Register to comment.