Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SV Discovery error: exception processing cluster

mdistlermdistler Los AngelesMember

Hello,

I am new to using GenomeSTRiP. I successfully installed the program and completed the install test. With my own data, I have successfully completed the SVPreprocessing step. However, when I get to the SVDiscovery step, I repeatedly encounter an error, as below:

INFO 12:57:47,130 10-Jul-2016 SVDiscovery - Processing clusters ...
INFO 12:57:47,245 10-Jul-2016 ReadCountDiskCache - Initializing read count disk cache [practice1/metadata/rccache.bin] ...
INFO 12:57:47,246 10-Jul-2016 ReadCountDiskCache - Initialized read count disk cache with 1 file.
INFO 12:57:47,259 10-Jul-2016 SVDiscovery - No hapmap snp genotype directory specified
INFO 12:57:47,263 10-Jul-2016 SVDiscovery - No array intensity data specified
INFO 12:57:48,175 10-Jul-2016 SVDiscovery - Clustering: Generating clusters for 252 read pairs.
INFO 12:57:48,350 10-Jul-2016 SVDiscovery - Clustering: LR split size 252 / 252 maximal clique size 226 clique count 1
INFO 12:57:48,352 10-Jul-2016 SVDiscovery - Clustering: LR split size 26 / 252 maximal clique size 21 clique count 1
INFO 12:57:48,352 10-Jul-2016 SVDiscovery - Clustering: LR split size 5 / 252 maximal clique size 3 clique count 2
INFO 12:57:48,353 10-Jul-2016 SVDiscovery - Processing cluster 19:4817787-4818235 19:4820125-4820633 LR 21
Error: Exception processing cluster: null
Cluster: 19:4817787-4818235 19:4820125-4820633 LR 21
INFO 12:57:50,391 10-Jul-2016 GATKRunReport - Uploaded run statistics report to AWS S3

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.NullPointerException
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.writeVCFRecord(DeletionDiscoveryAlgorithm.java:547)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.processCluster(DeletionDiscoveryAlgorithm.java:446)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.processClusters(DeletionDiscoveryAlgorithm.java:353)
at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.runDiscovery(DeletionDiscoveryAlgorithm.java:197)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:107)
at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:40)
at org.broadinstitute.gatk.engine.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:116)
at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:315)
at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:133)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:87)
at org.broadinstitute.sv.main.SVDiscovery.main(SVDiscovery.java:21)

ERROR ------------------------------------------------------------------------------------------

Some notes: I am using a non-human genome and omitted the genome mask. For reference, my code is documented below. Any advice would be greatly appreciated.

!/bin/bash

inputType=bam
if [ ! -z "$1" ]; then
inputType="$1"
fi

runDir=practice1
genotypes=practice1.genotypes.vcf
sites=practice1.discovery.vcf

These executables must be on your path.

which java > /dev/null || exit 1
which Rscript > /dev/null || exit 1
which samtools > /dev/null || exit 1

For SVAltAlign, you must use the version of bwa compatible with Genome STRiP.

export PATH=${SV_DIR}/bwa:${PATH}
export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH}

mx="-Xmx4g"
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"

mkdir -p ${runDir}/logs || exit 1
mkdir -p ${runDir}/metadata || exit 1

Display version information.

java -cp ${classpath} ${mx} -jar ${SV_DIR}/lib/SVToolkit.jar

Run preprocessing.

For large scale use, you should use -reduceInsertSizeDistributions, but this is too slow for the installation test.

The method employed by -computeGCProfiles requires a GC mask and is currently only supported for human genomes.

java -cp ${classpath} ${mx} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
--disableJobReport \
-cp ${classpath} \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R ~/project-zarlab/mouseBAM/chr19_new.fa \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-ploidyMapFile ~/project-jflint/mouseBAM/chr19.ploidymap.txt \
-reduceInsertSizeDistributions false \
-computeGCProfiles true \
-computeReadCounts true \
-jobLogDir ${runDir}/logs \
-I ~/project-zarlab/mouseBAM/input.list \
-run \
|| exit 1

Run discovery.

java -cp ${classpath} ${mx} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVDiscovery.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
--disableJobReport \
-cp ${classpath} \
-configFile conf/genstrip_installtest_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R ~/project-zarlab/mouseBAM/chr19_new.fa \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-disableGATKTraversal \
-genderMapFile /u/home/m/mdistler/project-jflint/genomestrip/svtoolkit/installtest/gender.map \
-jobLogDir ${runDir}/logs \
-minimumSize 100 \
-maximumSize 1000000 \
-suppressVCFCommandLines \
-I ~/project-zarlab/mouseBAM/input.list \
-O ${sites} \
-run \
|| exit 1

Best Answer

Answers

  • mdistlermdistler Los AngelesMember

    Edit: The updated code is below (generates the same error message):

    export SV_DIR=cd .. && pwd
    SV_TMPDIR=./tmpdir

    inputType=bam
    if [ ! -z "$1" ]; then
    inputType="$1"
    fi

    runDir=practice1
    genotypes=practice1.genotypes.vcf
    sites=practice1.discovery.vcf

    which java > /dev/null || exit 1
    which Rscript > /dev/null || exit 1
    which samtools > /dev/null || exit 1

    export PATH=${SV_DIR}/bwa:${PATH}
    export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH}

    mx="-Xmx4g"
    classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"

    mkdir -p ${runDir}/logs || exit 1
    mkdir -p ${runDir}/metadata || exit 1

    java -cp ${classpath} ${mx} -jar ${SV_DIR}/lib/SVToolkit.jar

    org.broadinstitute.gatk.queue.QCommandLine \
    -S ${SV_DIR}/qscript/SVPreprocess.q \
    -S ${SV_DIR}/qscript/SVQScript.q \
    -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
    --disableJobReport \
    -cp ${classpath} \
    -configFile ${SV_DIR}/conf/genstrip_parameters.txt \
    -tempDir ${SV_TMPDIR} \
    -R ~/project-zarlab/mouseBAM/chr19_new.fa \
    -runDirectory ${runDir} \
    -md ${runDir}/metadata \
    -ploidyMapFile ~/project-jflint/mouseBAM/chr19.ploidymap.txt \
    -reduceInsertSizeDistributions false \
    -computeGCProfiles true \
    -computeReadCounts true \
    -jobLogDir ${runDir}/logs \
    -I ~/project-zarlab/mouseBAM/input.list \
    -run \
    || exit 1
    

    java -cp ${classpath} ${mx} \
    org.broadinstitute.gatk.queue.QCommandLine \
    -S ${SV_DIR}/qscript/SVDiscovery.q \
    -S ${SV_DIR}/qscript/SVQScript.q \
    -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
    --disableJobReport \
    -cp ${classpath} \
    -configFile conf/genstrip_installtest_parameters.txt \
    -tempDir ${SV_TMPDIR} \
    -R ~/project-zarlab/mouseBAM/chr19_new.fa \
    -runDirectory ${runDir} \
    -md ${runDir}/metadata \
    -disableGATKTraversal \
    -genderMapFile /u/home/m/mdistler/project-jflint/genomestrip/svtoolkit/installtest/gender.map \
    -jobLogDir ${runDir}/logs \
    -minimumSize 100 \
    -maximumSize 1000000 \
    -suppressVCFCommandLines \
    -I ~/project-zarlab/mouseBAM/input.list \
    -O ${sites} \
    -run \
    || exit 1

  • mdistlermdistler Los AngelesMember

    I just changed the script so that -computeGCProfiles is false in the SV Preprocessing step. Same error results. Any advice, @bhandsaker? Thanks!

  • mdistlermdistler Los AngelesMember

    @bhandsaker, Thank you so much for your help. This helped fix the problem.

Sign In or Register to comment.