Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Elapsed time about the CNVDiscoveryPipeline

Hi Bob,why does the CNVDiscoveryPipeline is so time consuming? I test a WGS sample (about 30x),and run about 4 days,and it is still runing.This is my script about the CNVDiscoveryPipeline:


If you adapt this script for your own use, you will need to set these two variables based on your environment.

SV_DIR is the installation directory for SVToolkit - it must be an exported environment variable.

SV_TMPDIR is a directory for writing temp files, which may be large if you have a large data set.

export SV_DIR=/work/SoftW/svtoolkit


These executables must be on your path.

which java > /dev/null || exit 1
which Rscript > /dev/null || exit 1
which samtools > /dev/null || exit 1

For SVAltAlign, you must use the version of bwa compatible with Genome STRiP.

export PATH=${SV_DIR}/bwa:${PATH}


mkdir -p ${runDir}/logs || exit 1
mkdir -p ${runDir}/metadata || exit 1

java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile conf/genstrip_parameters.txt \
-R /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.fasta \
-I ${inputFile} \
-md ${runDir}/metadata \
-runDirectory ${runDir} \
-jobLogDir ${runDir}/logs \
-intervalList /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.interval.list \
-genderMapFile /work1/wsh/4.test/1.perl/1.pipetest/WGS/2016006L-3-1_gender.map \
-jobRunner Shell \
--disableJobReport \
-tempDir ${SV_TMPDIR} \
-gatkJobRunner Shell \
-retry 10 \
-tilingWindowSize 1000 \
-tilingWindowOverlap 500 \
-maximumReferenceGapLength 1000 \
-boundaryPrecision 100 \
-minimumRefinedLength 500 \
-genotypingParallelRecords 500 \




Could you help me check my script Whether there are some mistake? Thank you very much.



  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    The pipeline is designed to run on multiple samples (generally 20 to 30 or more, but for best results batches of 100 or so are preferable). I'm not sure what the expected behavior would be on one sample, but it's possible this is causing it to run for an excessively long time.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    If you want to run a small test, it is better to use multiple sample but run on a small interval (e.g. using -intervalList with a single small interval).

Sign In or Register to comment.