SVDiscovery error: computeSampleLongMap

I have encountered a problem when I run SVDiscovery pipeline.

#
#
#

The script of SVDiscovery I had used is:

!/bin/bash

export SV_DIR=/work/SoftW/svtoolkit
runDir=IEMS
SV_TMPDIR=IEMS/tmpdir_SVDiscovery
inputFile=IEMS.list
sites=IEMS.discovery.vcf
genotypes=IEMS.genotypes.vcf

These executables must be on your path.

which java > /dev/null || exit 1
which Rscript > /dev/null || exit 1
which samtools > /dev/null || exit 1

For SVAltAlign, you must use the version of bwa compatible with Genome STRiP.

export PATH=${SV_DIR}/bwa:${PATH}
export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH}

mx="-Xmx4g"
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"

mkdir -p ${runDir}/logs || exit 1
mkdir -p ${runDir}/metadata || exit 1

Run discovery.

java -cp ${classpath} ${mx} \
org.broadinstitute.gatk.queue.QCommandLine \
-cp ${classpath} \
-S ${SV_DIR}/qscript/SVDiscovery.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
--disableJobReport \
-configFile conf/genstrip_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.fasta \
-genomeMaskFile /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.svmask.fasta \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-genderMapFile IEMS_gender.map \
-disableGATKTraversal \
-jobLogDir ${runDir}/logs \
-suppressVCFCommandLines \
-sample IEMS_sample.list \
-minimumSize 100 \
-maximumSize 1000000 \
-I ${inputFile} \
-O ${sites} \
-run

#
#
#

And the error is:

INFO 16:44:56,231 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 16:44:56,234 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7.GS-r1748-0-g74bfe0b, Compiled 2017/10/06 08:08:49
INFO 16:44:56,234 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
INFO 16:44:56,234 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
INFO 16:44:56,234 HelpFormatter - [Fri Jan 26 16:44:56 CST 2018] Executing on Linux 2.6.32-696.6.3.el6.x86_64 amd64
INFO 16:44:56,234 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_141-b15
INFO 16:44:56,240 HelpFormatter - Program Args: -T SVDiscoveryWalker -R /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.fasta -O /work1/wsh/4.test/1.perl/1.pipetest/IEMS/IEMS/P0001.discovery.vcf.gz -disableGATKTraversal true -md IEMS/metadata -configFile conf/genstrip_parameters.txt -runDirectory IEMS -genderMapFile IEMS_gender.map -genomeMaskFile /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.svmask.fasta -partitionName P0001 -runFilePrefix P0001 -storeReadPairFile true -L 1:1-249250621 -searchLocus 1:1-249250621 -searchWindow 1:1-249250621 -searchMinimumSize 100 -searchMaximumSize 1000000
INFO 16:44:56,241 HelpFormatter - Executing as wsh@localhost on Linux 2.6.32-696.6.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_141-b15.
INFO 16:44:56,241 HelpFormatter - Date/Time: 2018/01/26 16:44:56
INFO 16:44:56,241 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 16:44:56,242 HelpFormatter - -----------------------------------------------------------------------------------------
INFO 16:44:56,247 26-Jan-2018 GenomeAnalysisEngine - Strictness is SILENT
INFO 16:44:56,385 26-Jan-2018 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 16:44:56,408 26-Jan-2018 IntervalUtils - Processing 249250621 bp from intervals
INFO 16:44:56,461 26-Jan-2018 GenomeAnalysisEngine - Preparing for traversal
INFO 16:44:56,466 26-Jan-2018 GenomeAnalysisEngine - Done preparing for traversal
INFO 16:44:56,466 26-Jan-2018 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 16:44:56,466 26-Jan-2018 ProgressMeter - | processed | time | per 1M | | total | remaining
INFO 16:44:56,466 26-Jan-2018 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
INFO 16:44:56,466 26-Jan-2018 SVDiscovery - Initializing SVDiscovery ...
INFO 16:44:56,467 26-Jan-2018 SVDiscovery - Reading configuration file ...
INFO 16:44:56,470 26-Jan-2018 SVDiscovery - Read configuration file.
INFO 16:44:56,470 26-Jan-2018 SVDiscovery - Opening reference sequence ...
INFO 16:44:56,470 26-Jan-2018 SVDiscovery - Opened reference sequence.
INFO 16:44:56,470 26-Jan-2018 SVDiscovery - Opening genome mask ...
INFO 16:44:56,471 26-Jan-2018 SVDiscovery - Opened genome mask.
INFO 16:44:56,471 26-Jan-2018 SVDiscovery - Initializing input data set ...
INFO 16:44:56,741 26-Jan-2018 SVDiscovery - Initialized data set: 25 files, 25 read groups, 25 samples.
INFO 16:44:56,743 26-Jan-2018 MetaData - Opening metadata ...
INFO 16:44:56,743 26-Jan-2018 MetaData - Adding metadata location IEMS/metadata ...
INFO 16:44:56,744 26-Jan-2018 MetaData - Opened metadata.
INFO 16:44:56,746 26-Jan-2018 SVDiscovery - Opened metadata.
INFO 16:44:56,750 26-Jan-2018 MetaData - Loading insert size histograms ...
INFO 16:44:56,858 26-Jan-2018 SVDiscovery - Processing locus: 1:1-249250621:100-1000000
INFO 16:44:56,858 26-Jan-2018 SVDiscovery - Locus search window: 1:1-249250621
INFO 16:45:26,469 26-Jan-2018 ProgressMeter - Starting 0.0 30.0 s 49.6 w 100.0% 30.0 s 0.0 s
INFO 16:45:27,838 26-Jan-2018 SVDiscovery - Discovery alt home filtering is disabled.
INFO 16:45:27,981 26-Jan-2018 SVDiscovery - Processing clusters ...

ERROR --
ERROR stack trace

java.lang.NullPointerException
at org.broadinstitute.sv.metadata.MetaData.computeSampleLongMap(MetaData.java:406)
at org.broadinstitute.sv.metadata.MetaData.getSampleReadSpanMap(MetaData.java:401)
at org.broadinstitute.sv.discovery.ClusterMembershipModule.init(ClusterMembershipModule.java:155)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.createMembershipModule(DeletionDiscoveryAlgorithm.java:1110)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.initClusterModules(DeletionDiscoveryAlgorithm.java:1055)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.processClusters(DeletionDiscoveryAlgorithm.java:386)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.runDiscovery(DeletionDiscoveryAlgorithm.java:204)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:107)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:40)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:115)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:133)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:87)
at org.broadinstitute.sv.main.SVDiscovery.main(SVDiscovery.java:21)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 3.7.GS-r1748-0-g74bfe0b):
ERROR
ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
ERROR If not, please post the error message, with stack trace, to the GATK forum.
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: Code exception (see stack trace for error itself)
ERROR ------------------------------------------------------------------------------------------
#

Can anyone help me? Thank you very much!

Tagged:

Best Answers

Answers

  • Can anyone help me? Thank you very much!

  • @bhandsaker
    Thanks for your help. I will have a try for you advise.
    The contents of ${runDir}/metadata/spans.dat is:
    SAMPLE LIBRARY READGROUP SPANCOVERAGE
    2017120044-IEMS bar 2017120044-IEMS 17046836
    And -sample is the sample list,because I have 25 samples.

  • bhandsakerbhandsaker Member, Broadie, Moderator

    Can you explain what you are trying to do?
    Are you trying to run on a single sample?
    Does spans.dat contain just two lines?
    That SPANCOVERAGE is extremely low, much less than 1x sequencing. Is that what you expect?

  • @bhandsaker
    I have 25 sample spans.dat.so each spans.dat contain just two lines.
    I just want test Genome STRiP by use Part of data(so SPANCOVERAGE is extremely low, much less than 1x sequencing),because I will use Genome STRiP for 25 WGS data.

  • @bhandsaker
    If I use a single sample,the SVDiscovery and SVGenotyper are successful,and can get the discovery.vcf and genotypes.vcf result.When I use 25 sample,I encountered the error:computeSampleLongMap.

  • @bhandsaker
    Thank you for you help.The 25 sample's spans.dat are in the spans directory.just like that:

    When I merge all the sample's spans.dat to a total single spans.dat.and run the SVDiscovery again,the computeSampleLongMap error No longer appear.fix the error

  • @bhandsaker
    Thank you for reply.Why the script:SVPreprocess and SVDiscovery did not complete successfully at one time? they mush run Once again

  • @bhandsaker
    Thank you for your reply,every time I run SVPreprocess,the logfile have some error just like:
    ERROR 17:28:21,087 FunctionEdge - Contents of /work1/wsh/4.test/1.perl/1.pipetest/IEMS/IEMS/logs/SVPreprocess-6.out
    So ,I must run SVPreprocess again When I use Genome STRiP very time?

  • bhandsakerbhandsaker Member, Broadie, Moderator

    If you can dig into the log files and post the error messages, we can try to take a look.

  • bhandsakerbhandsaker Member, Broadie, Moderator

    It would also be helpful if you show the command line you are using to run SVPreprocess.
    You should preprocess all 25 samples together (there are other ways to do it, but this is the easiest and the least error prone).

  • @bhandsaker
    The command line of my SVPreprocess are as follow:

    #
    #
    #

    !/bin/bash

    If you adapt this script for your own use, you will need to set these two variables based on your environment.

    SV_DIR is the installation directory for SVToolkit - it must be an exported environment variable.

    SV_TMPDIR is a directory for writing temp files, which may be large if you have a large data set.

    export SV_DIR=/work/SoftW/svtoolkit

    runDir=IEMS
    SV_TMPDIR=IEMS/tmpdir_SVPreprocess
    inputFile=IEMS.list
    sites=IEMS.discovery.vcf
    genotypes=IEMS.genotypes.vcf

    These executables must be on your path.

    which java > /dev/null || exit 1
    which Rscript > /dev/null || exit 1
    which samtools > /dev/null || exit 1

    For SVAltAlign, you must use the version of bwa compatible with Genome STRiP.

    export PATH=${SV_DIR}/bwa:${PATH}
    export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH}

    mx="-Xmx4g"
    classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"

    mkdir -p ${runDir}/logs || exit 1
    mkdir -p ${runDir}/metadata || exit 1

    Run preprocessing.

    For large scale use, you should use -reduceInsertSizeDistributions, but this is too slow for the installation test.

    The method employed by -computeGCProfiles requires a GC mask and is currently only supported for human genomes.

    java -cp ${classpath} ${mx} \
    org.broadinstitute.gatk.queue.QCommandLine \
    -S ${SV_DIR}/qscript/SVPreprocess.q \
    -S ${SV_DIR}/qscript/SVQScript.q \
    -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
    --disableJobReport \
    -cp ${classpath} \
    -configFile conf/genstrip_parameters.txt \
    -tempDir ${SV_TMPDIR} \
    -R /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.fasta \
    -genomeMaskFile /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.svmask.fasta \
    -copyNumberMaskFile /work/wsh/0.Pipeline/TargetSeq/Genome_STRiP_ref/Homo_sapiens_assembly19.gcmask.fasta \
    -genderMapFile IEMS_gender.map \
    -runDirectory ${runDir} \
    -md ${runDir}/metadata \
    -disableGATKTraversal \
    -useMultiStep \
    -reduceInsertSizeDistributions false \
    -computeGCProfiles true \
    -computeReadCounts true \
    -jobLogDir ${runDir}/logs \
    -I ${inputFile} \
    -run

    #
    #
    #

    And the error logfile is: 1_SVPreprocess.sh.log
    And post the log dir: logs.rar

  • @bhandsaker
    Thank you very much,when I use the parameter -retry >1.the SVPreprocess can run completely at one time

  • bhandsakerbhandsaker Member, Broadie, Moderator

    Do you believe this is deterministic? I.e. if you start clean, you always get the SampleLongMap error, but when you retry once the first retry always succeeds?

    If so, then it may be that we have a missing dependency in our Queue workflow definition. A missing dependency can cause this kind of problem, but we never encounter it when running larger workloads because things take more time so out-of-order execution never happens.

Sign In or Register to comment.