Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Preprossing: Depth.dat file empty, Discovery: Error-No read count found

I'm running GenomeSTRiP with a modified version of the discovery.sh script from installtest.
BAM files are all aligned to hg19, sorted with headers and have previously correctly run through the HaplotypeCaller GATK pipeline.

Preprocessing runs with no errors, however, at completion metadata/depth.bin contains only headers no depth entries. The hidden file .depth.dat.done is present

The discovery step throws an error:

ERROR MESSAGE: No read count found for sample: ADR114_36017_O_06_21_13

Logs are attached.

My commands to run preprocessing and discovery are:

Run preprocessing.

java -cp ${classpath} ${mx} \
org.broadinstitute.sting.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-cp ${classpath} \
-configFile conf/genstrip_installtest_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta \
-genomeMaskFile ${SV_DIR}/ref_data/Homo_sapiens_assembly19.mask.101.fasta \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-jobLogDir ${runDir}/logs \
-I bamfile.list \
-L ../gene_list_2kbFlanking.list \
-run \
|| exit 1

Run discovery.

java -cp ${classpath} ${mx} \
org.broadinstitute.sting.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVDiscovery.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-cp ${classpath} \
-configFile conf/genstrip_installtest_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta \
-genomeMaskFile ${SV_DIR}/ref_data/Homo_sapiens_assembly19.mask.101.fasta \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-jobLogDir ${runDir}/logs \
-minimumSize 100 \
-maximumSize 1000000 \
-I bamfile.list \
-L ../gene_list_2kbFlanking.list \
-O ${sites} \
-run \
|| exit 1

Screen output is:

[jarvis temp_svtoolkit]$ sh ./run_GenomeSTRiP.sh
SVToolkit version 1.03 (build 619)
Build date: 2011/02/12 17:23:22
Web site: http://www.broadinstitute.org/gsa/wiki/index.php/Genome_STRiP
INFO 11:54:22,878 QScriptManager - Compiling 2 QScripts
INFO 11:54:27,505 QScriptManager - Compilation complete
INFO 11:54:29,284 HelpFormatter - ---------------------------------------------------------
INFO 11:54:29,284 HelpFormatter - Program Name: org.broadinstitute.sting.queue.QCommandLine
INFO 11:54:29,285 HelpFormatter - Program Args: -S /cccbstore-rc/projects/cccb/apps/svtoolkit//qscript/SVPreprocess.q -S /cccbstore-rc/projects/cccb/apps/svtoolkit//qscript/SVQScript.q -gatk /cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/GenomeAnalysisTK.jar -cp /cccbstore-rc/projects/cccb/apps/svtoolkit//lib/SVToolkit.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/GenomeAnalysisTK.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/Queue.jar -configFile conf/genstrip_installtest_parameters.txt -tempDir ./tmpdir -R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta -genomeMaskFile /cccbstore-rc/projects/cccb/apps/svtoolkit//ref_data/Homo_sapiens_assembly19.mask.101.fasta -runDirectory test1 -md test1/metadata -jobLogDir test1/logs -I bamfile.list -L ../gene_list_2kbFlanking.list -run
INFO 11:54:29,285 HelpFormatter - Date/Time: 2014/01/30 11:54:29
INFO 11:54:29,285 HelpFormatter - ---------------------------------------------------------
INFO 11:54:29,285 HelpFormatter - ---------------------------------------------------------
INFO 11:54:29,286 QCommandLine - Scripting SVPreprocess
INFO 11:54:29,364 QCommandLine - Added 7 functions
INFO 11:54:29,364 QGraph - Generating graph.
INFO 11:54:29,383 QGraph - Running jobs.
INFO 11:54:29,417 ShellJobRunner - Starting: java -Xmx4g -Djava.io.tmpdir=/ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/tmpdir -cp /cccbstore-rc/projects/cccb/apps/svtoolkit//lib/SVToolkit.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/GenomeAnalysisTK.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/Queue.jar org.broadinstitute.sting.gatk.CommandLineGATK -T ComputeInsertSizeDistributions -R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta -I /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/bamfile.list -O /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/test1/metadata/isd/bamfile.list.hist.bin -md test1/metadata -createEmpty
INFO 11:54:29,417 ShellJobRunner - Output written to /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/test1/logs/[email protected]
INFO 11:56:43,114 ShellJobRunner - Done: java -Xmx4g -Djava.io.tmpdir=/ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/tmpdir -cp /cccbstore-rc/projects/cccb/apps/svtoolkit//lib/SVToolkit.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/GenomeAnalysisTK.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/Queue.jar org.broadinstitute.sting.gatk.CommandLineGATK -T ComputeInsertSizeDistributions -R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta -I /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/bamfile.list -O /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/test1/metadata/isd/bamfile.list.hist.bin -md test1/metadata -createEmpty

For Discovery:

INFO 12:00:46,230 QScriptManager - Compiling 2 QScripts
INFO 12:00:50,810 QScriptManager - Compilation complete
INFO 12:00:52,591 HelpFormatter - ---------------------------------------------------------
INFO 12:00:52,591 HelpFormatter - Program Name: org.broadinstitute.sting.queue.QCommandLine
INFO 12:00:52,591 HelpFormatter - Program Args: -S /cccbstore-rc/projects/cccb/apps/svtoolkit//qscript/SVDiscovery.q -S /cccbstore-rc/projects/cccb/apps/svtoolkit//qscript/SVQScript.q -gatk /cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/GenomeAnalysisTK.jar -cp /cccbstore-rc/projects/cccb/apps/svtoolkit//lib/SVToolkit.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/GenomeAnalysisTK.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/Queue.jar -configFile conf/genstrip_installtest_parameters.txt -tempDir ./tmpdir -R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta -genomeMaskFile /cccbstore-rc/projects/cccb/apps/svtoolkit//ref_data/Homo_sapiens_assembly19.mask.101.fasta -runDirectory test1 -md test1/metadata -jobLogDir test1/logs -minimumSize 100 -maximumSize 1000000 -I bamfile.list -L ../gene_list_2kbFlanking.list -O test1.discovery.vcf -run
INFO 12:00:52,592 HelpFormatter - Date/Time: 2014/01/30 12:00:52
INFO 12:00:52,592 HelpFormatter - ---------------------------------------------------------
INFO 12:00:52,592 HelpFormatter - ---------------------------------------------------------
INFO 12:00:52,593 QCommandLine - Scripting SVDiscovery
INFO 12:00:52,695 QCommandLine - Added 13 functions
INFO 12:00:52,695 QGraph - Generating graph.
INFO 12:00:52,741 QGraph - Running jobs.
INFO 12:00:52,783 ShellJobRunner - Starting: java -Xmx4g -Djava.io.tmpdir=/ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/tmpdir -cp /cccbstore-rc/projects/cccb/apps/svtoolkit//lib/SVToolkit.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/GenomeAnalysisTK.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/Queue.jar -verbose:gc org.broadinstitute.sv.main.SVDiscovery -T SVDiscovery -R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta -I /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/bamfile.list -O /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/test1/P0001.discovery.vcf -md test1/metadata -disableGATKTraversal -configFile conf/genstrip_installtest_parameters.txt -runDirectory test1 -genomeMaskFile /cccbstore-rc/projects/cccb/apps/svtoolkit/ref_data/Homo_sapiens_assembly19.mask.101.fasta -partitionName P0001 -filePrefix P0001 -L chr11:17097252-17231549 -searchLocus chr11:17097252-17231549 -searchWindow chr11:17097252-17231549 -searchMinimumSize 100 -searchMaximumSize 1000000
INFO 12:00:52,783 ShellJobRunner - Output written to /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/test1/logs/[email protected]
ERROR 12:01:04,042 ShellJobRunner - Error: java -Xmx4g -Djava.io.tmpdir=/ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/tmpdir -cp /cccbstore-rc/projects/cccb/apps/svtoolkit//lib/SVToolkit.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/GenomeAnalysisTK.jar:/cccbstore-rc/projects/cccb/apps/svtoolkit//lib/gatk/Queue.jar -verbose:gc org.broadinstitute.sv.main.SVDiscovery -T SVDiscovery -R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta -I /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/bamfile.list -O /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/test1/P0001.discovery.vcf -md test1/metadata -disableGATKTraversal -configFile conf/genstrip_installtest_parameters.txt -runDirectory test1 -genomeMaskFile /cccbstore-rc/projects/cccb/apps/svtoolkit/ref_data/Homo_sapiens_assembly19.mask.101.fasta -partitionName P0001 -filePrefix P0001 -L chr11:17097252-17231549 -searchLocus chr11:17097252-17231549 -searchWindow chr11:17097252-17231549 -searchMinimumSize 100 -searchMaximumSize 1000000
ERROR 12:01:04,042 ShellJobRunner - [GC 525312K->63810K(2010112K), 0.0667620 secs]
INFO 12:00:56,002 HelpFormatter - ---------------------------------------------------------------------------
INFO 12:00:56,004 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.0.5039M, Compiled 2011/01/20 22:58:34
INFO 12:00:56,004 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 12:00:56,004 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
INFO 12:00:56,004 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
INFO 12:00:56,005 HelpFormatter - Program Args: -T SVDiscovery -R /cccbstore-rc/projects/db/gatk/hg19/ucsc.hg19.fasta -O /ifs/labs/cccb/projects/cccb/instruments/SN389/Runs/131127_SN389_0430_AC38TRACXX/Data/Intensities/BaseCalls/Projects.First/Project_First_Last/temp_svtoolkit/test1/P0001.discovery.vcf -md test1/metadata -disableGATKTraversal -configFile conf/genstrip_installtest_parameters.txt -runDirectory test1 -genomeMaskFile /cccbstore-rc/projects/cccb/apps/svtoolkit/ref_data/Homo_sapiens_assembly19.mask.101.fasta -partitionName P0001 -filePrefix P0001 -L chr11:17097252-17231549 -searchLocus chr11:17097252-17231549 -searchWindow chr11:17097252-17231549 -searchMinimumSize 100 -searchMaximumSize 1000000
INFO 12:00:56,005 HelpFormatter - Date/Time: 2014/01/30 12:00:56
INFO 12:00:56,005 HelpFormatter - ---------------------------------------------------------------------------
INFO 12:00:56,005 HelpFormatter - ---------------------------------------------------------------------------
INFO 12:00:56,023 GenomeAnalysisEngine - Strictness is SILENT
INFO 12:00:56,175 SVDiscovery - Initializing SVDiscovery ...
INFO 12:00:56,226 SVDiscovery - Opening reference sequence ...
INFO 12:00:56,227 SVDiscovery - Opened reference sequence.
INFO 12:00:56,227 SVDiscovery - Opening genome mask ...
INFO 12:00:56,227 SVDiscovery - Opened genome mask.
INFO 12:00:56,227 SVDiscovery - Initializing input data set ...
INFO 12:00:56,308 SVDiscovery - Initialized data set: 10 files, 3 read groups, 10 samples.
INFO 12:00:56,308 SVDiscovery - Opening metadata from test1/metadata ...
INFO 12:00:56,309 SVDiscovery - Opened metadata.
INFO 12:00:56,310 SVDiscovery - Initializing discovery algorithm ...
INFO 12:00:56,311 SVDiscovery - Loading insert size histograms ...
INFO 12:00:56,388 SVDiscovery - Loaded 10 histograms.
INFO 12:00:56,477 SVDiscovery - Processing locus: chr11:17097252-17231549:100-1000000
INFO 12:00:56,477 SVDiscovery - Locus search window: chr11:17097252-17231549
INFO 12:00:56,477 SVDiscovery - Selecting read pairs ...
INFO 12:00:56,478 SVDiscovery - Reading input file ../Sample_ADR87_19787_O_06_21_13/BWA_outM1/ADR87_19787_O_06_21_13.rmdup.sorted.headers.bam ...
INFO 12:00:57,108 SVDiscovery - Reading input file ../Sample_ADR114_36017_O_06_21_13/BWA_outM1/ADR114_36017_O_06_21_13.rmdup.sorted.headers.bam ...
[GC 589122K->45983K(2535424K), 0.0416200 secs]
INFO 12:00:57,278 SVDiscovery - Reading input file ../Sample_ADR3_5032_O_06_21_13/BWA_outM1/ADR3_5032_O_06_21_13.rmdup.sorted.headers.bam ...
INFO 12:00:57,495 SVDiscovery - Reading input file ../Sample_ADR9_18420_O_06_21_13/BWA_outM1/ADR9_18420_O_06_21_13.rmdup.sorted.headers.bam ...
INFO 12:00:57,625 SVDiscovery - Reading input file ../Sample_ADR61_66_O_06_21_13/BWA_outM1/ADR61_66_O_06_21_13.rmdup.sorted.headers.bam ...
INFO 12:00:58,688 SVDiscovery - Reading input file ../Sample_ADR67_34206_O_06_21_13/BWA_outM1/ADR67_34206_O_06_21_13.rmdup.sorted.headers.bam ...
INFO 12:00:58,787 SVDiscovery - Reading input file ../Sample_ADR156_Z_22648_07_08_13/BWA_outM1/ADR156_Z_22648_07_08_13.rmdup.sorted.headers.bam ...
INFO 12:00:59,140 SVDiscovery - Reading input file ../Sample_ADR18_82241_O_06_21_13/BWA_outM1/ADR18_82241_O_06_21_13.rmdup.sorted.headers.bam ...
INFO 12:00:59,988 SVDiscovery - Reading input file ../Sample_ADR155_Z_25458_07_08_13/BWA_outM1/ADR155_Z_25458_07_08_13.rmdup.sorted.headers.bam ...
INFO 12:01:00,362 SVDiscovery - Reading input file ../Sample_ADR83_76972_O_06_21_13/BWA_outM1/ADR83_76972_O_06_21_13.rmdup.sorted.headers.bam ...
[GC 1096607K->13282K(2535424K), 0.0100590 secs]
INFO 12:01:01,803 SVDiscovery - Processing clusters ...
INFO 12:01:01,982 SVDiscovery - Clustering: Generating clusters for 2 read pairs.
INFO 12:01:01,983 SVDiscovery - Clustering: Generating clusters for 2 read pairs.
INFO 12:01:01,983 SVDiscovery - Processing cluster chr11:17099173-17099508 chr11:17100308-17100700 LR 2

DBG: RC Cache fill chr11:17089509-17189508 100000 10 1.757184 sec

Error: Exception processing cluster: No read count found for sample: ADR114_36017_O_06_21_13
Cluster: chr11:17099173-17099508 chr11:17100308-17100700 LR 2
INFO 12:01:03,819 GATKRunReport - Aggregating data for run report
[GC 687505K->10279K(2708992K), 0.0086560 secs]
[Full GC 10279K->9734K(2708992K), 0.0440250 secs]

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.RuntimeException: No read count found for sample: ADR114_36017_O_06_21_13
at org.broadinstitute.sv.discovery.ClusterDepthModule.computeDepth(ClusterDepthModule.java:152)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.processCluster(DeletionDiscoveryAlgorithm.java:395)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.processClusters(DeletionDiscoveryAlgorithm.java:303)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.runDiscovery(DeletionDiscoveryAlgorithm.java:163)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:150)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:43)
at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:75)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:217)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:111)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:110)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:239)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:72)
at org.broadinstitute.sv.main.SVDiscovery.main(SVDiscovery.java:21)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 1.0.5039M):
ERROR
ERROR Please visit to wiki to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
ERROR
ERROR MESSAGE: No read count found for sample: ADR114_36017_O_06_21_13
ERROR ------------------------------------------------------------------------------------------
Tagged:

Best Answer

Answers

Sign In or Register to comment.