We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Elapsed time about the CNVDiscoveryPipeline

Hi Bob,why does the CNVDiscoveryPipeline is so time consuming? I test a WGS sample (about 30x),and run about 4 days,and it is still runing.This is my script about the CNVDiscoveryPipeline:


If you adapt this script for your own use, you will need to set these two variables based on your environment.

SV_DIR is the installation directory for SVToolkit - it must be an exported environment variable.

SV_TMPDIR is a directory for writing temp files, which may be large if you have a large data set.

export SV_DIR=/work/SoftW/svtoolkit


These executables must be on your path.

which java > /dev/null || exit 1
which Rscript > /dev/null || exit 1
which samtools > /dev/null || exit 1

For SVAltAlign, you must use the version of bwa compatible with Genome STRiP.

export PATH=${SV_DIR}/bwa:${PATH}


mkdir -p ${runDir}/logs || exit 1
mkdir -p ${runDir}/metadata || exit 1

java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile conf/genstrip_parameters.txt \
-R /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.fasta \
-I ${inputFile} \
-md ${runDir}/metadata \
-runDirectory ${runDir} \
-jobLogDir ${runDir}/logs \
-intervalList /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.interval.list \
-genderMapFile /work1/wsh/4.test/1.perl/1.pipetest/WGS/2016006L-3-1_gender.map \
-jobRunner Shell \
--disableJobReport \
-tempDir ${SV_TMPDIR} \
-gatkJobRunner Shell \
-retry 10 \
-tilingWindowSize 1000 \
-tilingWindowOverlap 500 \
-maximumReferenceGapLength 1000 \
-boundaryPrecision 100 \
-minimumRefinedLength 500 \
-genotypingParallelRecords 500 \




Could you help me check my script Whether there are some mistake? Thank you very much.



  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    The pipeline is designed to run on multiple samples (generally 20 to 30 or more, but for best results batches of 100 or so are preferable). I'm not sure what the expected behavior would be on one sample, but it's possible this is causing it to run for an excessively long time.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    If you want to run a small test, it is better to use multiple sample but run on a small interval (e.g. using -intervalList with a single small interval).

Sign In or Register to comment.