The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Queue custom job schedulers

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited April 2014 in Pipelining with Queue

Implementing a Queue JobRunner

The following scala methods need to be implemented for a new JobRunner. See the implementations of GridEngine and LSF for concrete full examples.

1. class JobRunner.start()

Start should to copy the settings from the CommandLineFunction into your job scheduler and invoke the command via sh <jobScript>. As an example of what needs to be implemented, here is the current contents of the start() method in MyCustomJobRunner which contains the pseudo code.

  def start() {
    // TODO: Copy settings from function to your job scheduler syntax.

    val mySchedulerJob = new ...

    // Set the display name to 4000 characters of the description (or whatever your max is)
    mySchedulerJob.displayName = function.description.take(4000)

    // Set the output file for stdout
    mySchedulerJob.outputFile = function.jobOutputFile.getPath

    // Set the current working directory
    mySchedulerJob.workingDirectory = function.commandDirectory.getPath

    // If the error file is set specify the separate output for stderr
    if (function.jobErrorFile != null) {
      mySchedulerJob.errFile = function.jobErrorFile.getPath
    }

    // If a project name is set specify the project name
    if (function.jobProject != null) {
      mySchedulerJob.projectName = function.jobProject
    }

    // If the job queue is set specify the job queue
    if (function.jobQueue != null) {
      mySchedulerJob.queue = function.jobQueue
    }

    // If the resident set size is requested pass on the memory request
    if (residentRequestMB.isDefined) {
      mySchedulerJob.jobMemoryRequest = "%dM".format(residentRequestMB.get.ceil.toInt)
    }

    // If the resident set size limit is defined specify the memory limit
    if (residentLimitMB.isDefined) {
      mySchedulerJob.jobMemoryLimit = "%dM".format(residentLimitMB.get.ceil.toInt)
    }

    // If the priority is set (user specified Int) specify the priority
    if (function.jobPriority.isDefined) {
      mySchedulerJob.jobPriority = function.jobPriority.get
    }

    // Instead of running the function.commandLine, run "sh <jobScript>"
    mySchedulerJob.command = "sh " + jobScript

    // Store the status so it can be returned in the status method.
    myStatus = RunnerStatus.RUNNING

    // Start the job and store the id so it can be killed in tryStop
    myJobId = mySchedulerJob.start()
  }

2. class JobRunner.status

The status method should return one of the enum values from org.broadinstitute.sting.queue.engine.RunnerStatus:

  • RunnerStatus.RUNNING
  • RunnerStatus.DONE
  • RunnerStatus.FAILED

3. object JobRunner.init()

Add any initialization code to the companion object static initializer. See the LSF or GridEngine implementations for how this is done.

4. object JobRunner.tryStop()

The jobs that are still in RunnerStatus.RUNNING will be passed into this function. tryStop() should send these jobs the equivalent of a Ctrl-C or SIGTERM(15), or worst case a SIGKILL(9) if SIGTERM is not available.

Running Queue with a new JobRunner

Once there is a basic implementation, you can try out the Hello World example with -jobRunner MyJobRunner.

java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S scala/qscript/examples/HelloWorld.scala -jobRunner MyJobRunner -run

If all goes well Queue should dispatch the job to your job scheduler and wait until the status returns RunningStatus.DONE and hello world should be echo'ed into the output file, possibly with other log messages.

See QFunction and Command Line Options for more info on Queue options.

Post edited by Geraldine_VdAuwera on

Comments

  • Has someone tried to generate a JobRunner that works with SLURM?

  • cbannistercbannister Nottingham, UKMember

    It seems the links to LSF and GridEngine examples are broken - can this be fixed ?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Links fixed. These may break again in the next few weeks as we refactor package names in the codebase. If that happens just let me know.

  • cbannistercbannister Nottingham, UKMember

    Thanks for quick response!

  • The links for the GridEngine and LSF examples and for "See QFunction and Command Line Options for more info on Queue options." are broken. Could you please update when you get a chance? Thank you so much.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    We need to overhaul these docs, and unfortunately it's not a priority right now. But we'll try to get these links fixed in the near future.

    As an aside, be sure to check out the Presentations section -- there are a couple of useful resources there for working with Queue.

  • I have an idea for a custom job runner that I'd like to try, but I don't seem to be able to generate Queue.jar after writing it.

    I think I've been able to before - has anything changed? Neither mvn verify nor mvn package generates Queue.jar or any relevant .bz2 that I can find.

    cheers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Are you able to compile Queue from an untouched copy of the code, of the same version?

  • pdexheimerpdexheimer Member, Dev

    @dklevebring - are you running mvn verify in the root directory? That is, in gatk-protected rather than gatk-protected/public/gatk-queue?

  • I was trying to compile queue from gatk-public which fails, but gate-protected works. Thanks for the pointer and sorry for the lack of reply.

    Anyway, I'd love to see an update to this page with a slightly more info in implementing custom runners. It's hard from the info on this page to tell how outdated it is.

    Thanks!

  • pdexheimerpdexheimer Member, Dev

    I don't think the JobRunners have changed in any substantial ways in the last four or five years - this page still looks relevant to me (particularly the suggestion to look at the LSF/SGE implementations). In fact, the I think the biggest change since this page was written was the addition of the DRMAA runner, which would be another excellent resource.

  • mxqianmxqian Member
    edited March 4

    Hi @Geraldine_VdAuwera, our job scheduler was upgraded from LSF8 to LSF10, and I found my original Qscript cannot work again.
    Any idea? Need your help. Many thanks in advance. BTW, dryrun shows no problem. And I tried Java 8 but got the same error:

    A fatal error has been detected by the Java Runtime Environment:
    SIGSEGV (0xb) at pc=0x0000003507b3386f, pid=27007, tid=139944786716416

    JRE version: Java(TM) SE Runtime Environment (7.0_60-b19) (build 1.7.0_60-b19)
    Java VM: Java HotSpot(TM) 64-Bit Server VM (24.60-b09 mixed mode linux-amd64 compressed oops)
    Problematic frame:
    C [libc.so.6+0x13386f] __tls_get_addr@@GLIBC_2.3+0x13386f

    Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    An error report file with more information is saved as:
    xxx/hs_err_pid27007.log

    If you would like to submit a bug report, please visit:
    http://bugreport.sun.com/bugreport/crash.jsp
    The crash happened outside the Java Virtual Machine in native code.
    See problematic frame for where to report the bug.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    @mxqian Sorry, that looks like a system crash outside of Queue or GATK. Maybe a problem with insufficient computing resources? I would suggest asking your IT support team for help.
  • pdexheimerpdexheimer Member, Dev

    @mxqian - My experience is that Queue only works with LSF versions 7 and 8 - it failed for me in a very similar way on v9 several years ago.

    I have thus far managed to persuade my cluster admin to stay on LSF 8, but he's getting antsy. Time to start migrating to Cromwell...

  • mxqianmxqian Member

    @Geraldine_VdAuwera said:
    @mxqian Sorry, that looks like a system crash outside of Queue or GATK. Maybe a problem with insufficient computing resources? I would suggest asking your IT support team for help.

    Seems not about insufficient computing resources. I tried "Hello world" QScript and got the same error, and it's the same for Queue 3.7. So, I guess it could be "LibC.scala"-related, but I have no idea about that.

  • mxqianmxqian Member
    edited March 6

    @pdexheimer said:
    @mxqian - My experience is that Queue only works with LSF versions 7 and 8 - it failed for me in a very similar way on v9 several years ago.

    I have thus far managed to persuade my cluster admin to stay on LSF 8, but he's getting antsy. Time to start migrating to Cromwell...

    Yeah, I'm thinking about that. However, I just saw the detailed description for the local running, but not much for cluster. And re-writing all the pipelines for NGS would make me crazy. Just thinking about it already made me have a headache. BTW, a kind guy from our IT department will keep 2 nodes with LSF 8 for some time to let me finish the left parts of the project. She saved my life.

    Post edited by mxqian on
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    edited March 7

    Ah right, that makes sense. Thanks for jumping in, @pdexheimer.

    @mxqian We know of several centers running Cromwell on their clusters and we do have plans to document this in the near future, so don't let the current scarcity of cluster backend docs put you off of Cromwell permanently. Re: the pipelines, we are now sharing our own WDLs; even if they're not a drop-in replacement for your existing pipelines, we hope it might help ease the transition if/when you get there. Maybe when we all switch to GATK4? Anyway, good luck!

    Post edited by Geraldine_VdAuwera on
  • mxqianmxqian Member

    @Geraldine_VdAuwera Thank you so much for the information. It's time to learn the WDL now. Seemingly, it's not so hard and looks quite straightforward. BTW, I don't like scala very much actually. Just for Qscript, I learnt a little. I prefer java and groovy.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Indeed, WDL was designed specifically to be easy to write! I don't like Scala much either...

Sign In or Register to comment.