On Monday and Tuesday, November 12-13, the communications team will be out of the office for a U.S. federal holiday and a team event. We will be back in action on November 14th and apologize for any inconvenience this may cause. Thank you for using the forum.

genomestrip throws unhelpful slurm error when using the slurm-drmaa bridge

I am using the SLURM-DRMAA bridge and the pipeline throws an obstinate error. The obstinate error is "org.ggf.drmaa.InternalException: slurm_submit_batch_job: Invalid account or account/partition combination specified", but there are other errors when trying things differently. So it makes sense to ask here. If I could get access to the script being submitted to the cluster then I could investigate what is it that the SLURM system hates in my parameter specifications. Can I do that? From the error listing, since the Java machine catches the error it basically fails by delegating the error message to my local system, I would argue that it is bad practice.

Note that running things locally with -run and no -jobRunner specs works, but I want to use the cluster. My system admin says that since the processing stops at the pipeline level and there is no SLURM submission he cannot really help me and suggested me to try different combinations. Here is what I have tried and the errors.

- -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative "-A sens2016011-bianca" -jobNative "-p node" fails with org.ggf.drmaa.InternalException: slurm_submit_batch_job: Invalid account or account/partition combination specified
- -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative "-A sens2016011-bianca" -jobNative "-p core" -jobNative "-n 1" fails with org.ggf.drmaa.InternalException: slurm_submit_batch_job: Invalid account or account/partition combination specified
- -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative "-A sens2016011-bianca" -jobNative "-p core" -jobNative "-n 1" -jobNative "-t 20:00" same error as above
- -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative "-A sens2016011-bianca" fails with Too many cores requested for -p core partition. Minimum cpus requested is 4294967294. To use more than  16 cores, request -p node.
- -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative "-A sens2016011-bianca" -jobNative "-N 1" same as above
- -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative A sens2016011-bianca -jobNative p core -jobNative N 1 fails with Unable to submit job: Invalid native specification: A sens2016011-bianca p core N 1
- -run -jobRunner Drmaa fails with Use the flag -A to specify an active project with allocation on this cluster.

Here is the full error listing:

$ java -Xmx4g -cp /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/SVToolkit.jar:/proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/Queue.jar org.broadinstitute.gatk.queue.QCommandLine -S /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/qscript/SVPreprocess.q -S /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/qscript/SVQScript.q -gatk /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/GenomeAnalysisTK.jar -configFile /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/conf/genstrip_parameters.txt -R /sw/data/uppnex/GATK/2.8/b37/human_g1k_v37.fasta -I /proj/sens2016011/nobackup/melt/data/bam_links/00028285.sorted.bam -md meta -bamFilesAreDisjoint true -jobLogDir /proj/sens2016011/nobackup/genomestrip/tests/logs -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative "-A sens2016011-bianca" -jobNative "-p core" -jobNative "-n 1" -jobNative "-t 20:00"
INFO  17:10:33,709 QScriptManager - Compiling 2 QScripts 
INFO  17:11:13,568 QScriptManager - Compilation complete 
INFO  17:11:13,936 HelpFormatter - ---------------------------------------------------------------------- 
INFO  17:11:13,936 HelpFormatter - Queue v3.7.GS-r1748-0-g74bfe0b, Compiled 2018/04/10 10:30:23 
INFO  17:11:13,936 HelpFormatter - Copyright (c) 2012 The Broad Institute 
INFO  17:11:13,936 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  17:11:13,937 HelpFormatter - Program Args: -S /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/qscript/SVPreprocess.q -S /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/qscript/SVQScript.q -gatk /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/GenomeAnalysisTK.jar -configFile /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/conf/genstrip_parameters.txt -R /sw/data/uppnex/GATK/2.8/b37/human_g1k_v37.fasta -I /proj/sens2016011/nobackup/melt/data/bam_links/00028285.sorted.bam -md meta -bamFilesAreDisjoint true -jobLogDir /proj/sens2016011/nobackup/genomestrip/tests/logs -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative -A sens2016011-bianca -jobNative -p core -jobNative -n 1 -jobNative -t 20:00 
INFO  17:11:13,937 HelpFormatter - Executing as [email protected] on Linux 3.10.0-862.3.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_171-b10. 
INFO  17:11:13,938 HelpFormatter - Date/Time: 2018/08/28 17:11:13 
INFO  17:11:13,938 HelpFormatter - ---------------------------------------------------------------------- 
INFO  17:11:13,938 HelpFormatter - ---------------------------------------------------------------------- 
INFO  17:11:13,953 QCommandLine - Scripting SVPreprocess 
INFO  17:11:15,238 QCommandLine - Added 190 functions 
INFO  17:11:15,257 QGraph - Generating graph. 
INFO  17:11:15,351 QGraph - Running jobs. 
INFO  17:11:17,092 FunctionEdge - Starting:  'java'  '-Xmx2048m'  '-XX:+UseParallelOldGC'  '-XX:ParallelGCThreads=4'  '-XX:GCTimeLimit=50'  '-XX:GCHeapFreeLimit=10'  '-Djava.io.tmpdir=/castor/project/proj_nobackup/genomestrip/tests/batch/.queue/tmp'  '-cp' '/proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/SVToolkit.jar:/proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/Queue.jar'  'org.broadinstitute.sv.apps.ComputeGenomeSizes'  '-O' '/castor/project/proj_nobackup/genomestrip/tests/batch/meta/genome_sizes.txt'  '-R' '/sw/data/uppnex/GATK/2.8/b37/human_g1k_v37.fasta'    
INFO  17:11:17,093 FunctionEdge - Output written to /proj/sens2016011/nobackup/genomestrip/tests/logs/SVPreprocess-5.out 
ERROR 17:11:17,119 Retry - Caught error during attempt 1 of 4. 
org.ggf.drmaa.InternalException: slurm_submit_batch_job: Invalid account or account/partition combination specified
        at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:400)
        at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:392)
        at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.runJob(JnaSession.java:79)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.runJob(DrmaaJobRunner.scala:115)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply$mcV$sp(DrmaaJobRunner.scala:93)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply(DrmaaJobRunner.scala:91)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply(DrmaaJobRunner.scala:91)
        at org.broadinstitute.gatk.queue.util.Retry$.attempt(Retry.scala:50)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.start(DrmaaJobRunner.scala:91)
        at org.broadinstitute.gatk.queue.engine.FunctionEdge.start(FunctionEdge.scala:101)
        at org.broadinstitute.gatk.queue.engine.QGraph.startOneJob(QGraph.scala:646)
        at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:507)
        at org.broadinstitute.gatk.queue.engine.QGraph.run(QGraph.scala:168)
        at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:170)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
        at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:61)
        at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)
ERROR 17:11:17,121 Retry - Retrying in 1.0 minute. 

Answers

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    This really is a slurm / drmaa question, but I will try to offer some help.
    I would start by making sure you can run a simple script a la "echo hello world" and figure out what options you need to pass to srun to get the script to run. Then you have to map these slurm options to -jobNative arguments.

    Note that there is no "script" that is being created or run - the slurm jobs are being created through the api (which is returning the error messages you are seeing).

  • Hi sergiun,

    Could you share the script you used to run the pipeline and to submit the job? And my script is (use SVCNVDiscovery as an example, svcnvdiscovery_single.sh):

    The pipeline code is:

    classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
    svpreprocess_dir="/proj/yunligrp/users/minzhi/gs_test_svpreprocess_fulllist_success_rerun01"
    rundir="/proj/yunligrp/users/minzhi/gs_test_svcnvdiscovery"
    
    java -Xmx4g -cp ${classpath} \
        org.broadinstitute.gatk.queue.QCommandLine \
        -S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \
        -S ${SV_DIR}/qscript/SVQScript.q \
        -cp ${classpath} \
        -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
        -configFile ${SV_DIR}/conf/genstrip_parameters.txt \
        -R /proj/yunligrp/users/minzhi/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta \
        -I /proj/yunligrp/users/minzhi/gs_script/NWD.recab.list \
        -genderMapFile /proj/yunligrp/users/minzhi/gs_script/JHS_full_all_male_gender.map \
        -ploidyMapFile /proj/yunligrp/users/minzhi/gs_script/standard_ploidy.map \
        -md ${svpreprocess_dir}/md_tempdir \
        -runDirectory ${rundir} \
        -jobLogDir ${rundir}/logs \
        -intervalList /proj/yunligrp/users/minzhi/gs_script/reference_chromosomes16_1-500000.list \
        -tilingWindowSize 1000 \
        -tilingWindowOverlap 500 \
        -maximumReferenceGapLength 1000 \
        -boundaryPrecision 100 \
        -minimumRefinedLength 500 \
        -jobRunner Drmaa \
        -gatkJobRunner Drmaa \
        -jobNative "--mem-per-cpu=5000 --time=02:00:00 --nodes=1 --ntasks-per-node=16" \
        -jobQueue general \
        -run \
        || exit 1
    

    I use the sbatch script shown below to submit the job (submit_svcnvdiscovery.sbatch)

    #!/bin/bash
    
    #SBATCH --job-name=gs_svcnvdiscovery
    #SBATCH -p general
    ##SBATCH -p bigmem
    ##SBATCH --qos bigmem_access 
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=16
    #SBATCH --time=10:00:00
    #SBATCH --mem=2GB
    #SBATCH --output=/proj/yunligrp/users/minzhi/output_error/genomestrip/%x_%j.out
    #SBATCH --error=/proj/yunligrp/users/minzhi/output_error/genomestrip/%x_%j.err
    
    module purge
    module load r/3.5.0
    module load samtools/1.8
    module load tabix/0.2.6
    module load drmaa/1.0.7-PSNC
    
    cd /proj/yunligrp/users/minzhi/gs_test_svcnvdiscovery/
    
    sh ../gs_script/svcnvdiscovery_single.sh
    
    exit
    

    I am not sure if this is the best way to run the pipeline, but this works in the HPCC of our university.

    Best regards,
    Wusheng

  • Thanks for the replies, the staff response passes the ball to somewhere else (ie the slurm bridge) when I specifically pointed out that the genomestrip error handling makes it impossible to debug. From the point of view of the slurm-drmaa guys the fault lies with the program since they have nothing to work on. No task gets submitted to the slurm-drmaa bridge. And they are logically right, you are logically wrong.

    @Wusheng, I simply run the command line (after loading my modules and configuring the right paths):

    java -Xmx4g -cp /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/SVToolkit.jar:/proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/Queue.jar org.broadinstitute.gatk.queue.QCommandLine -S /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/qscript/SVPreprocess.q -S /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/qscript/SVQScript.q -gatk /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/lib/gatk/GenomeAnalysisTK.jar -configFile /proj/sens2016011/nobackup/genomestrip/lib/svtoolkit/conf/genstrip_parameters.txt -R /sw/data/uppnex/GATK/2.8/b37/human_g1k_v37.fasta -I /proj/sens2016011/nobackup/melt/data/bam_links/00028285.sorted.bam -md meta -bamFilesAreDisjoint true -jobLogDir /proj/sens2016011/nobackup/genomestrip/tests/logs -run -jobRunner Drmaa -gatkJobRunner Drmaa -jobNative "-A sens2016011-bianca" -jobNative "-p core" -jobNative "-n 1" -jobNative "-t 20:00"

    The full code is using a workflow library (snakemake). The script being generated by it fails the same way. It is not a scripting issue, if the command would pass, then I can fix my script as well.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    I would suggest debugging this in two steps:
    1. Create (and post) a small example script that issues a "hello world" command using srun that works successfully in your environment and uses the options you want to use above.
    2. Write a small Queue script that runs the same "hello world" command through Queue and attempts to pass the same options through -jobNative.

  • sergiunsergiun Member
    edited September 26

    @bhandsaker , can you give me an example Queue hello world script that can be test-run on a grid? The only thing I could find about Queue in the docs is that people/tools moved to WDL.

Sign In or Register to comment.