Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

QFunction and Command Line Options

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,981Administrator, GATK Developer admin
edited February 3 in Queue

These are the most popular Queue command line options. For a complete and up to date list run with -help. QScripts may also add additional command line options.

1. Queue Command Line Options

Command Line Argument Description Default
-run If passed the scripts are run. If not passed a dry run is executed. dry run
-jobRunner <jobrunner> The job runner to dispatch jobs. Setting to Lsf706, GridEngine, or Drmaa will dispatch jobs to LSF or Grid Engine using the job settings (see below). Defaults to Shell which runs jobs on a local shell one at a time. Shell
-bsub Alias for -jobRunner Lsf706 not set
-qsub Alias for -jobRunner GridEngine not set
-status Prints out a summary progress. If a QScript is currently running via -run, you can run the same command line with -status instead to print a summary of progress. not set
-retry <count> Retries a QFunction that returns a non-zero exit code up to count times. The QFunction must not have set jobRestartable to false. 0 = no retries
-startFromScratch Restarts the graph from the beginning. If not specified for each output file specified on a QFunction, ex: /path/to/output.file, Queue will not re-run the job if a .done file is found for the all the outputs, ex: /path/to/.output.file.done. use .done files to determine if jobs are complete
-keepIntermediates By default Queue deletes the output files of QFunctions that set .isIntermediate to true. delete intermediate files
-statusTo <email> Email address to send status to whenever a) A job fails, or b) Queue has run all the functions it can run and is exiting. not set
-statusFrom <email> Email address to send status emails from. user@local.domain
-dot <file> If set renders the job graph to a dot file. not rendered
-l <logging_level> The minimum level of logging, DEBUG, INFO, WARN, or FATAL. INFO
-log <file> Sets the location to save log output in addition to standard out. not set
-debug Set the logging to include a lot of debugging information (SLOW!) not set
-jobReport Path to write the job report text file. If R is installed and available on the $PATH then a pdf will be generated visualizing the job report. jobPrefix.jobreport.txt
-disableJobReport Disables writing the job report. not set
-help Lists all of the command line arguments with their descriptions. not set

2. QFunction Options

The following options can be specified on the command line over overridden per QFunction.

Command Line Argument QFunction Property Description Default
-jobPrefix .jobName The unique name of the job. Used to prefix directories and log files. Use -jobNamePrefix on the Queue command line to replace the default prefix Q-<processid>@<host>. <jobNamePrefix>-<jobNumber>
N/A .jobOutputFile Captures stdout and if jobErrorFile is null it captures stderr as well. <jobName>.out
N/A .jobErrorFile If not null captures stderr. null
N/A .commandDirectory The directory to execute the command line from. current directory
-jobProject .jobProject The project name for the job. default job runner project
-jobQueue .jobQueue The queue to dispatch the job. default job runner queue
-jobPriority .jobPriority The dispatch priority for the job. Lowest priority = 0. Highest priority = 100. default job runner priority
-jobNative .jobNativeArgs Native args to pass to the job runner. Currently only supported in GridEngine and Drmaa. The string is concatenated to the native arguments passed over DRMAA. Example: -w n. none
-jobResReq .jobResourceRequests Resource requests to pass to the job runner. On GridEngine this is multiple -l <req>. On LSF a single -R <req> is generated. memory reservations and limits on LSF and GridEngine
-jobEnv .jobEnvironmentNames Predefined environment names to pass to the job runner. On GridEngine this is -pe <env>. On LSF this is -a <env>. none
-memLimit .memoryLimit The memory limit for the job in gigabytes. Used to populate the variables residentLimit and residentRequest which can also be set separately. default job runner memory limit
-resMemLimit .residentLimit Limit for the resident memory in gigabytes. On GridEngine this is -l mem_free=<mem>. On LSF this is -R rusage[mem=<mem>]. memoryLimit * 1.2
-resMemReq .residentRequest Requested amount of resident memory in gigabytes. On GridEngine this is -l h_rss=<mem>. On LSF this is -R rusage[select=<mem>]. memoryLimit

3. Email Status Options

Command Line Argument Description Default
-emailHost <hostname> SMTP host name localhost
-emailPort <port> SMTP port 25
-emailTLS If set uses TLS. not set
-emailSSL If set uses SSL. not set
-emailUser <username> If set along with emailPass or emailPassFile authenticates the email with this username. not set
-emailPassFile <file> If emailUser is also set authenticates the email with contents of the file. not set
-emailPass <password> If emailUser is also set authenticates the email with this password. NOT SECURE: Use emailPassFile instead! not set
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • flescaiflescai Posts: 51Member ✭✭

    I successfully launched a .scala script with HaplotypeCaller, but the java machine is initialised with these memory settings

    INFO 23:20:32,815 FunctionEdge - Done: 'java' '-Xmx2048m'

    and some jobs failed reporting this suggestion

    ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java

    how can I modify the -Xmx and -Xms settings of the java launched by different steps of my script?

    thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,981Administrator, GATK Developer admin

    Look at section 2. QFunction Options of this article. You will find the command-line options that you can use to set the memory limits.

    Geraldine Van der Auwera, PhD

  • flescaiflescai Posts: 51Member ✭✭
    edited September 2012

    Thanks Geraldine, it wasn't quite clear to me.

    but ".memoryLimit" doesn't seem to specify java machine memory limits (i.e. -Xmx) but job limits via scheduler parameters. is that correct? The box says it is used to populate the variables residentLimit and residentRequest.

    I tried a dry run by adding -memLimit 8 and I still get jobs launched with INFO 16:13:33,682 QGraph - Pending: 'java' '-Xmx2048m' How can I then affect this parameter?

    thanks!

    Post edited by flescai on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,981Administrator, GATK Developer admin

    The -memLimit argument is what you want (it is the memory passed to the jvm with -Xmx) -- and it should work.

    You may need to check your script to see if the memory limit is being overriden somewhere. We do that in a lot of our scripts, to assign different memory limits for different types of jobs (say I want to give 2 gigs to recalibration jobs, but 4 gigs to genotyping jobs). So that might be the case in the example scripts that you based your script on. Anything in the script will override what's passed on the command line.

    Geraldine Van der Auwera, PhD

  • ikiiki Posts: 1Member

    I'm not entirely sure what the jobEnv option is supposed to do. The version I have has both this option, and the jobParaEnv option. If it's supposed to use GridEngine's pe option (parallel environment), then these are duplicates -- it is clear that jobParaEnv is supposed to set the parallel environment, but it is not clear whether jobEnv is supposed to do the same, or set an environment variable for the job. If jobEnv is supposed to set an environment variable for the job, then it is not doing it correctly -- '-v' is the appropriate option for qsub to do that in GridEngine.

    A clarification on this would be really helpful.

Sign In or Register to comment.