CNVDiscoveryPipeline fails at Stage5 -- no warn messages

pjmtelepjmtele Member

Hi I'm using Genome STRiP CNVDiscoveryPipeline (v2.00.1833) on WGS data from a collection of inbred maize lines. I have populated the metadata directory to the best of my ability and was able to get SVPreprocess to complete successfully in 3 batches. All files in the metadata directory seem sensible with the exception of sample_gender_report.txt, which is blank except for the header.

I am running into difficulty within the CNVDiscoveryPipeline at Stage5. The error message is:

INFO  15:49:25,058 QJobsReporter - Writing JobLogging GATKReport to file /panfs/roc/groups/14/hirschc1/pmonnaha/CNVDiscoveryPipeline.jobreport.txt 
INFO  15:49:25,083 QJobsReporter - Plotting JobLogging GATKReport to file /panfs/roc/groups/14/hirschc1/pmonnaha/CNVDiscoveryPipeline.jobreport.pdf 
WARN  15:49:26,351 RScriptExecutor - RScript exited with 1. Run with -l DEBUG for more info. 
INFO  15:49:26,352 QCommandLine - Done with errors 
INFO  15:49:26,353 QGraph - ------- 
INFO  15:49:26,354 QGraph - Failed:   'java'  '-Xmx14336m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/panfs/roc/groups/14/hirschc1/pmonnaha/.queue/tmp'  '-cp' '/home/hirschc1/pmonnaha/software/svtoolkit/lib/SVToolkit.jar:/home/hirschc1/pmonnaha/software/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/home/hirschc1/pmonnaha/software/svtoolkit/lib/gatk/Queue.jar'  '-cp' '/home/hirschc1/pmonnaha/software/svtoolkit/lib/SVToolkit.jar:/home/hirschc1/pmonnaha/software/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/home/hirschc1/pmonnaha/software/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.gatk.queue.QCommandLine'  '-cp' '/home/hirschc1/pmonnaha/software/svtoolkit/lib/SVToolkit.jar:/home/hirschc1/pmonnaha/software/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/home/hirschc1/pmonnaha/software/svtoolkit/lib/gatk/Queue.jar'  '-S' '/home/hirschc1/pmonnaha/software/svtoolkit/qscript/discovery/cnv/CNVDiscoveryStage5.q'  '-S' '/home/hirschc1/pmonnaha/software/svtoolkit/qscript/discovery/cnv/CNVDiscoveryStageBase.q' '-S' '/home/hirschc1/pmonnaha/software/svtoolkit/qscript/discovery/cnv/CNVDiscoveryGenotyper.q'  '-S' '/home/hirschc1/pmonnaha/software/svtoolkit/qscript/SVQScript.q'  '-gatk' '/home/hirschc1/pmonnaha/software/svtoolkit/lib/gatk/GenomeAnalysisTK.jar'  '-jobLogDir' '/panfs/roc/scratch/pmonnaha/Maize/gstrip/w22/cnv_stage5/logs'  '-memLimit' '14.0'  '-jobRunner' 'Drmaa'  '-gatkJobRunner' 'Drmaa'  '-jobNative' '-l walltime=24:00:00'  -run  '-runDirectory' '/panfs/roc/scratch/pmonnaha/Maize/gstrip/w22/cnv_stage5'  '-sentinelFile' '/panfs/roc/scratch/pmonnaha/Maize/gstrip/w22/cnv_sentinel_files/stage_5.sent'  --disableJobReport  '-configFile' '/home/hirschc1/pmonnaha/software/svtoolkit/conf/genstrip_parameters.txt'  '-P' 'depth.parityCorrectionThreshold:null'  '-R' '/home/hirschc1/pmonnaha/misc-files/gstrip/W22_chr1-10.fasta'  '-ploidyMapFile' '/home/hirschc1/pmonnaha/misc-files/gstrip/W22_chr1-10.ploidymap.txt'  '-genderMapFile' '/home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-0/sample_gender.report.txt' '-genderMapFile' '/home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-1/sample_gender.report.txt' '-genderMapFile' '/home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-2/sample_gender.report.txt'  '-md' '/home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-0' '-md' '/home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-1' '-md' '/home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-2'  -disableGATKTraversal  '-I' '/panfs/roc/scratch/pmonnaha/Maize/gstrip/w22/bam_headers/merged_headers.bam'  '-vpsReportsDirectory' '/panfs/roc/scratch/pmonnaha/Maize/gstrip/w22/cnv_stage4'  '-selectedSamplesList' '/panfs/roc/scratch/pmonnaha/Maize/gstrip/w22/cnv_stage5/eval/DiscoverySamples.list'  
INFO  15:49:26,354 QGraph - Log:     /panfs/roc/scratch/pmonnaha/Maize/gstrip/w22/logs/CNVDiscoveryPipeline-44.out 
INFO  15:49:26,354 QCommandLine - Script failed: 61 Pend, 0 Run, 1 Fail, 43 Done 
------------------------------------------------------------------------------------------
Done. ------------------------------------------------------------------------------------------

However, the log file CNVDiscoveryPipeline44.out says 'There were no warn messages'. Furthermore, it seems that the actual error happened earlier on the the pipeline. The Stage3 merged.sites.vcf files only contain the headers and no variant information. Within the Stage2 results, several files seem incorrect: the ClusterSeparation.report.dat file has NA in all columns except for ID, the GenotypeLikelihoodStats file is empty and so is the VariantsPerSample and SelectedVariants file. Oddly, the log files for Stage2 all say 'There were no warn messages'. An example of output from Stage1 looks like:

CHROM POS ID REF ALT QUAL FILTER INFO

chr2 1 CNV_chr2_1_1000 A . . END=1000;SVTYPE=CNV
chr2 500 CNV_chr2_500_1500 T . . END=1500;SVTYPE=CNV
chr2 1000 CNV_chr2_1000_2000 A . . END=2000;SVTYPE=CNV
chr2 1500 CNV_chr2_1500_2500 G . . END=2500;SVTYPE=CNV
chr2 2000 CNV_chr2_2000_3000 T . . END=3000;SVTYPE=CNV
chr2 2500 CNV_chr2_2500_3500 T . . END=3500;SVTYPE=CNV

The log fails for Stage1 also do not point to any errors. Does anyone have an idea as to what is going wrong? Or where should I be looking to track down the error?

My job script is:

module load java/jdk1.8.0_144
module load samtools
module load htslib/1.6
module load R/3.3.3
module load libdrmaa/1.0.13

SV_DIR="/home/hirschc1/pmonnaha/software/svtoolkit"
export LD_LIBRARY_PATH=${SV_DIR}:${LD_LIBRARY_PATH}
export SV_DIR
export PATH=${SV_DIR}:${PATH}
export LD_LIBRARY_PATH=/panfs/roc/msisoft/libdrmaa/1.0.13/lib/:${LD_LIBRARY_PATH}

classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
java -Xmx14g -cp ${classpath} \
     org.broadinstitute.gatk.queue.QCommandLine \
     -S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \
     -S ${SV_DIR}/qscript/SVQScript.q \
     -cp ${classpath} \
     -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
     -configFile ${SV_DIR}/conf/genstrip_parameters.txt \
     -R /home/hirschc1/pmonnaha/misc-files/gstrip/W22_chr1-10.fasta \
     -I /home/hirschc1/pmonnaha/misc-files/gstrip/W22_E2_Bams.txt \
     -md /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-0 \
     -md /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-1 \
     -md /home/hirschc1/pmonnaha/misc-files/gstrip/W22_MetaData_E2-2 \
     -runDirectory /panfs/roc/scratch/pmonnaha/Maize/gstrip/w22 \
     -jobLogDir /panfs/roc/scratch/pmonnaha/Maize/gstrip/w22/logs \
     -jobRunner Drmaa \
     -gatkJobRunner Drmaa \
     -P depth.parityCorrectionThreshold:null \
     -tilingWindowSize 1000 \
     -tilingWindowOverlap 500 \
     -maximumReferenceGapLength 1000 \
     -boundaryPrecision 100 \
     -minimumRefinedLength 500 \
     -retry 10 \
     -memLimit 14 \
     -startFromScratch \
     -jobNative '-l walltime=24:00:00' \
     -run

Best Answer

Answers

  • pjmtelepjmtele Member

    Thanks for the tip. That seems to have done the trick. I searched through my error files, and surprisingly, don't see any mention regarding a problem with the input file.

Sign In or Register to comment.