Holiday Notice:
The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!

Error: "The 1th sequences in the reference ./resources/gatk/GRCh38 is named chr1"

trbtrb San FranciscoMember
edited November 2014 in GenomeSTRiP

I'm trying to run GenomeStrip pre-processing step on bams mapped to b38 (using mask and ploidy files I created for b38). It is crashing with a GATK error below. Below is the command I'm using, followed by the error. (Also, if it is relevant, I'm running this on LSF verion 9 using the DRMAA).

script-----------------

export SV_DIR=./svtoolkit/
SV_TMPDIR=./tmp
genome_fasta=./resources/gatk/GRCh38/GRCh38.fa
mask_fasta=./resources/gatk/GRCh38/svtoolkit/GRCh38.mask.100.fasta
cn2_mask_fasta=./resources/gatk/GRCh38/svtoolkit/GRCh38.cn2_mask.fasta
ploidy_file=./resources/gatk/GRCh38/svtoolkit/humgen_g1k_v38_ploidy.map

runDir=drmaa_test
bam=pilot_recalBam.list
sites=upto100kb.discovery.vcf
genotypes=upto100kb.genotypes.vcf
gender_file=pilot_sex.map

export PATH=${SV_DIR}/bwa:${PATH}
export LD_LIBRARY_PATH=${SV_DIR}/bwa:${LD_LIBRARY_PATH}

mx="-Xmx4g"
classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"

# Display version information.
java -cp ${classpath} ${mx} -jar ${SV_DIR}/lib/SVToolkit.jar
# Run preprocessing.
java -cp ${classpath} ${mx} \
org.broadinstitute.sting.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
--disableJobReport \
-cp ${classpath} \
-configFile genstrip_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R ${genome_fasta} \
-genomeMaskFile ${mask_fasta} \
-genderMapFile ${gender_file} \
-ploidyMapFile ${ploidy_file} \
-copyNumberMaskFile ${cn2_mask_fasta} \
-runDirectory ${runDir} \
-md ${runDir}/metadata \
-disableGATKTraversal \
-bamFilesAreDisjoint \
-useMultiStep \
-reduceInsertSizeDistributions \
-computeGCProfiles \
-jobLogDir ${runDir}/logs \
-I ${bam} \
-run \
-jobRunner Drmaa \
-jobNative "-R rusage[mem=8]" -jobNative "-q long" \
-jobProject oct14_pilot \
-jobQueue long

Error ------------------

INFO 20:34:23,569 HelpFormatter - --------------------------------------------------------------------------------
INFO 20:34:23,571 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-g5671483, Compiled 2013/08/22 15:09:27
INFO 20:34:23,571 HelpFormatter - Copyright (c) 2010 The Broad Institute
INFO 20:34:23,572 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 20:34:23,578 HelpFormatter - Program Args: -T ComputeReadDepthCoverageWalker -R ./resources/gatk/GRCh38/GRCh38.fa -O samp1.recal.depth.txt -disableGATKTraversal true -md drmaa_test/metadata -ploidyMapFile ./resources/gatk/GRCh38/svtoolkit /humgen_g1k_v38_ploidy.map -genomeMaskFile ./resources/gatk/GRCh38/svtoolkit/GRCh38.mask.100.fasta -minMapQ 10 -insertSizeRadius 10.0
INFO 20:34:23,578 HelpFormatter - Date/Time: 2014/11/12 20:34:23
INFO 20:34:23,578 HelpFormatter - --------------------------------------------------------------------------------
INFO 20:34:23,578 HelpFormatter - --------------------------------------------------------------------------------
INFO 20:34:23,679 GenomeAnalysisEngine - Strictness is SILENT
INFO 20:34:23,996 GenomeAnalysisEngine - Downsampling Settings: No downsampling
INFO 20:34:24,104 GenomeAnalysisEngine - Creating shard strategy for 0 BAM files
INFO 20:34:24,184 GenomeAnalysisEngine - Done creating shard strategy
INFO 20:34:24,184 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO 20:34:24,184 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
INFO 20:34:24,186 MetaData - Opening metadata ...
INFO 20:34:24,186 MetaData - Adding metadata directory drmaa_test/metadata ...
INFO 20:34:24,343 MetaData - Opened metadata.
#INFO: ReadCountAlgorithm: detected metadata version 1, forcing legacy behavior
INFO 20:34:26,526 MetaData - Loading insert size distributions ...
INFO 20:34:26,881 ComputeReadDepthCoverageWalker - Using genome mask ./resources/gatk/GRCh38/svtoolkit/GRCh38.mask.100.fasta
WARN 20:34:29,498 RestStorageService - Error Response: PUT '/GATK_Run_Reports /8hAP9b1KaAHyGPA4wBv6TLRp2lFOgSdc.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 814, Content-MD5: h0E4ckiYeugki8Q6nT/9RQ==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 8741387248987ae8248bc43a9d3ffd45, Date: Thu, 13 Nov 2014 04:34:28 GMT, Authorization: AWS AKIAIMHBU7X642TCHQ2A:JpdDFUyt67wkkCosce9QEUMuV3M=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-358.6.1.el6.x86_64; amd64; en; JVM 1.7.0_19), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 353E7628C07E489E, x-amz-id-2: s2LdMKwfftQXGX3qj/omBuz+NhFfDygFlvoLBNU+XBHVPh5+xkM2QAVPTQMqCrYI, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 13 Nov 2014 04:34:28 GMT, Connection: close, Server: AmazonS3]

ERROR ------------------------------------------------------------------------------------------
ERROR stack trace

java.lang.RuntimeException: The 1th sequences in the reference ./resources/gatk/GRCh38 is named chr1
at org.broadinstitute.sv.metadata.depth.ComputeReadDepthCoverageWalker.createDefaultReadDepthIntervalList(ComputeReadDepthCoverageWalker.java:202)
at org.broadinstitute.sv.metadata.depth.ComputeReadDepthCoverageWalker.initialize(ComputeReadDepthCoverageWalker.java:124)
at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:84)
at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:286)
at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:125)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:79)
at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:59)

ERROR ------------------------------------------------------------------------------------------
ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-g5671483):
ERROR
ERROR Please check the documentation guide to see if this is a known problem
ERROR If not, please post the error, with stack trace, to the GATK forum
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions http://www.broadinstitute.org/gatk
ERROR
ERROR MESSAGE: The 1th sequences in the reference ./resources/gatk/GRCh38 is named chr1
ERROR ------------------------------------------------------------------------------------------
Tagged:

Best Answers

Answers

  • trbtrb San FranciscoMember

    Thanks. In addition to autosomes I added chrX, chrY, and chrMT to the intervals.list and I get the error below. My ploidy file looks like this:
    chrX 1 156040895 F 2
    chrX 1 154931043 M 1
    chrY 1 57227415 F 0
    chrY 1 57227415 M 1

          • 2

    Script

    java -cp ${classpath} ${mx} \
    org.broadinstitute.sting.queue.QCommandLine \
    -S ${SV_DIR}/qscript/SVPreprocess.q \
    -S ${SV_DIR}/qscript/SVQScript.q \
    -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
    --disableJobReport \
    -cp ${classpath} \
    -configFile genstrip_parameters.txt \
    -tempDir ${SV_TMPDIR} \
    -R ${genome_fasta} \
    -genomeMaskFile ${mask_fasta} \
    -genderMapFile ${gender_file} \
    -ploidyMapFile ${ploidy_file} \
    -copyNumberMaskFile ${cn2_mask_fasta} \
    -runDirectory ${runDir} \
    -md ${runDir}/metadata \
    -disableGATKTraversal \
    -bamFilesAreDisjoint \
    -useMultiStep \
    -reduceInsertSizeDistributions \
    -computeGCProfiles \
    -jobLogDir ${runDir}/logs \
    -I ${bam} \
    -run \
    -computeMetadataOverInterval interval.list \
    -jobRunner Drmaa \
    -jobNative "-R rusage[mem=8]" -jobNative "-q long" \
    -jobProject oct14_pilot \
    -jobQueue long

    Error

    INFO 20:55:25,176 HelpFormatter - --------------------------------------------------------------------------------
    INFO 20:55:25,178 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.5-2-g5671483, Compiled 2013/08/22 15:09:27
    INFO 20:55:25,178 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 20:55:25,178 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 20:55:25,182 HelpFormatter - Program Args: -T ComputeReadDepthCoverageWalker -R ./GRCh38/GRCh38.fa -O ./chrlist_test/metadata/depth/01089200.recal.depth.txt -disableGATKTraversal true -md chrlist_test/metadata -ploidyMapFile ./GRCh38/svtoolkit/humgen_g1k_v38_ploidy.map -genomeMaskFile ./GRCh38/svtoolkit/GRCh38.mask.100.fasta -genomeInterval interval.list -minMapQ 10 -insertSizeRadius 10.0
    INFO 20:55:25,182 HelpFormatter - Date/Time: 2014/11/14 20:55:25
    INFO 20:55:25,182 HelpFormatter - --------------------------------------------------------------------------------
    INFO 20:55:25,182 HelpFormatter - --------------------------------------------------------------------------------
    INFO 20:55:25,308 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:55:25,560 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO 20:55:25,750 GenomeAnalysisEngine - Creating shard strategy for 0 BAM files
    INFO 20:55:25,799 GenomeAnalysisEngine - Done creating shard strategy
    INFO 20:55:25,799 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 20:55:25,800 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
    INFO 20:55:25,801 MetaData - Opening metadata ...
    INFO 20:55:25,806 MetaData - Adding metadata directory chrlist_test/metadata ...
    INFO 20:55:25,845 MetaData - Opened metadata.
    #INFO: ReadCountAlgorithm: detected metadata version 1, forcing legacy behavior
    INFO 20:55:25,936 MetaData - Loading insert size distributions ...
    INFO 20:55:25,956 ComputeReadDepthCoverageWalker - Using genome mask ./GRCh38/svtoolkit/GRCh38.mask.100.fasta
    WARN 20:55:26,874 RestStorageService - Error Response: PUT '/GATK_Run_Reports/orc74UXvczuanlBMEvpavMZliBOfaLbv.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 768, Content-MD5: j3Lk9Mlj78HF4ecu46ReBA==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 8f72e4f4c963efc1c5e1e72ee3a45e04, Date: Sat, 15 Nov 2014 04:55:26 GMT, Authorization: AWS AKIAIMHBU7X642TCHQ2A:2I/EUd6gs2fCAg/uxSV0sdSuedg=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-358.6.1.el6.x86_64; amd64; en; JVM 1.7.0_19), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 2A8DD2033C422560, x-amz-id-2: KWNPuACQrWkXG5axY2wS43yMSlHPR19xCGbExAzYb34VdcFYIh4LGuCMHGP1cZhU, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Sat, 15 Nov 2014 04:55:26 GMT, Connection: close, Server: AmazonS3]

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.RuntimeException: Sequence chrX has gender-independent ploidy of NaN
    at org.broadinstitute.sv.metadata.depth.ComputeReadDepthCoverageWalker.validateReadDepthIntervalSet(ComputeReadDepthCoverageWalker.java:183)
    at org.broadinstitute.sv.metadata.depth.ComputeReadDepthCoverageWalker.initialize(ComputeReadDepthCoverageWalker.java:128)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:84)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:286)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:125)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
    at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:79)
    at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:59)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.5-2-g5671483):
    ERROR
    ERROR Please check the documentation guide to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Sequence chrX has gender-independent ploidy of NaN
    ERROR ------------------------------------------------------------------------------------------
  • wheatwillwheatwill China,Wuhan,Huazhong agriculture university Member

    Hi,
    I am trying to run the SVToolkit.2.00.1529 on rice ,which is a self-pollinated pants and has no genders. I got an error "

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace java.lang.RuntimeException:

    The 1th sequences in the reference /home/ligw/genome_strip/svtoolkit/installtest/data is named chr13"
    Followed the comments, I add "-computeMetadataOverInterval interval.list " to my script. but it doesn't seems work.
    I got an error ():
    "SVToolkit version 2.00 (build 1529)
    Build date: 2015/01/25 21:14:07
    Web site: http://www.broadinstitute.org/software/genomestrip
    INFO 14:05:22,034 QScriptManager - Compiling 2 QScripts
    INFO 14:05:34,606 QScriptManager - Compilation complete

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    org.broadinstitute.gatk.utils.commandline.InvalidArgumentException:
    Argument with name 'computeMetadataOverInterval' isn't defined.
    at org.broadinstitute.gatk.utils.commandline.ParsingEngine.validate(ParsingEngine.java:306)
    at org.broadinstitute.gatk.utils.commandline.ParsingEngine.validate(ParsingEngine.java:279)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:216)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:62)
    at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version ):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Argument with name 'computeMetadataOverInterval' isn't defined.
    ERROR ------------------------------------------------------------------------------------------"

    java -cp ${classpath} ${mx} \
    org.broadinstitute.gatk.queue.QCommandLine \
    -S ${SV_DIR}/qscript/SVPreprocess.q \
    -S ${SV_DIR}/qscript/SVQScript.q \
    -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
    --disableJobReport \
    -cp ${classpath} \
    -configFile conf/genstrip_installtest_parameters.txt \
    -tempDir ${SV_TMPDIR} \
    -R data/rice7_reference.fasta \
    -genomeMaskFile data/rice7_reference_svmask.fasta \
    -genderMapFile data/input_bam_files_gender.map \
    -computeMetadataOverInterval data/interval.list \
    -runDirectory ${runDir} \
    -md ${runDir}/metadata \
    -disableGATKTraversal \
    -useMultiStep \
    -reduceInsertSizeDistributions true \
    -computeReadCounts true \
    -computeGCProfiles true \
    -jobLogDir ${runDir}/logs \
    -I ${bam} \
    -run \
    || exit 1
    thans !

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    This post relates to Genome STRiP version 1.04, not version 2.0.

    A number of the arguments have changed. The documentation for Genome STRiP 2.0 is available online here:
    http://www.broadinstitute.org/software/genomestrip/documentation

    Check the GS 2.0 documentation for SVPreprocess and see if following that helps.

    A snapshot of the documentation is also in each tarball, but the online documentation may be more up to date as we are working to make it more complete. Some of the older (but still useful) utilities are still only documented on the GATK forum.

Sign In or Register to comment.