We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
gatk "queue" - just getting started, trying to get "hello world" example working with Grid Engine.

Good morning team!
First, I have to qualify my question with that I'm a unix sysadmin- trying to get the "queue" functionality implemented in our cluster so our analysts can play. I'm hoping my question is simple, here goes:
We have SGE, and I have downloaded the binary "queue" package.
My first attempt at executing the "hello world" example came up with this error:
[email protected]:~> java -jar /apps/Queue-2.5-2-gf57256b/Queue.jar -S /apps/Queue-2.5-2-gf57256b/examples/HelloWorld.scala -jobRunner GridEngine -run
INFO 11:04:28,560 QScriptManager - Compiling 1 QScript
INFO 11:04:31,265 QScriptManager - Compilation complete
INFO 11:04:31,340 HelpFormatter - ----------------------------------------------------------------------
INFO 11:04:31,340 HelpFormatter - Queue v2.5-2-gf57256b, Compiled 2013/05/01 09:29:04
INFO 11:04:31,340 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 11:04:31,340 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:04:31,341 HelpFormatter - Program Args: -S /apps/Queue-2.5-2-gf57256b/examples/HelloWorld.scala -jobRunner GridEngine -run
INFO 11:04:31,341 HelpFormatter - Date/Time: 2013/06/05 11:04:31
INFO 11:04:31,341 HelpFormatter - ----------------------------------------------------------------------
INFO 11:04:31,341 HelpFormatter - ----------------------------------------------------------------------
INFO 11:04:31,346 QCommandLine - Scripting HelloWorld
INFO 11:04:31,363 QCommandLine - Added 1 functions
INFO 11:04:31,364 QGraph - Generating graph.
INFO 11:04:31,373 QGraph - Running jobs.
ERROR 11:04:31,427 QGraph - Uncaught error running jobs.
java.lang.UnsatisfiedLinkError: Unable to load library 'drmaa': libdrmaa.so: cannot open shared object file: No such file or directory
ooops! Seems I can't find the drmaa library by default. So, I fixed that by adding the following directory to the library search path on the node: /gridware/sge/lib/lx-amd64 (which is where that library lives).
Success! Sort of. The error above is resolved, but I am now getting the error below, and this is where I'm stuck. It doesn't look like the job is actually getting submitted, OR, it's getting submitted and dies. I would really appreciate any insight the team can offer, we are very excited to try to get this environment to work, thank you in advance!
[email protected]:~> java -jar /apps/Queue-2.5-2-gf57256b/Queue.jar -S /apps/Queue-2.5-2-gf57256b/examples/HelloWorld.scala -jobRunner GridEngine -run
INFO 11:07:52,728 QScriptManager - Compiling 1 QScript
INFO 11:07:55,208 QScriptManager - Compilation complete
INFO 11:07:55,271 HelpFormatter - ----------------------------------------------------------------------
INFO 11:07:55,271 HelpFormatter - Queue v2.5-2-gf57256b, Compiled 2013/05/01 09:29:04
INFO 11:07:55,271 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 11:07:55,271 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 11:07:55,272 HelpFormatter - Program Args: -S /apps/Queue-2.5-2-gf57256b/examples/HelloWorld.scala -jobRunner GridEngine -run
INFO 11:07:55,272 HelpFormatter - Date/Time: 2013/06/05 11:07:55
INFO 11:07:55,272 HelpFormatter - ----------------------------------------------------------------------
INFO 11:07:55,272 HelpFormatter - ----------------------------------------------------------------------
INFO 11:07:55,276 QCommandLine - Scripting HelloWorld
INFO 11:07:55,292 QCommandLine - Added 1 functions
INFO 11:07:55,292 QGraph - Generating graph.
INFO 11:07:55,298 QGraph - Running jobs.
INFO 11:07:55,481 FunctionEdge - Starting: echo hello world
INFO 11:07:55,482 FunctionEdge - Output written to /shared/users/kcb/HelloWorld-1.out
ERROR 11:07:55,507 Retry - Caught error during attempt 1 of 4.
org.ggf.drmaa.InternalException: Error reading answer list from qmaster
at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:400)
at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:392)
at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.runJob(JnaSession.java:79)
at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner$$anonfun$liftedTree1$1$1.apply$mcV$sp(DrmaaJobRunner.scala:87)
at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner$$anonfun$liftedTree1$1$1.apply(DrmaaJobRunner.scala:85)
at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner$$anonfun$liftedTree1$1$1.apply(DrmaaJobRunner.scala:85)
at org.broadinstitute.sting.queue.util.Retry$.attempt(Retry.scala:49)
at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner.liftedTree1$1(DrmaaJobRunner.scala:85)
at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner.start(DrmaaJobRunner.scala:84)
at org.broadinstitute.sting.queue.engine.FunctionEdge.start(FunctionEdge.scala:84)
at org.broadinstitute.sting.queue.engine.QGraph.runJobs(QGraph.scala:434)
at org.broadinstitute.sting.queue.engine.QGraph.run(QGraph.scala:156)
at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:171)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62)
at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
ERROR 11:07:55,510 Retry - Retrying in 1.0 minute.
Answers
I have to add: Running the job without the gridengine jobrunner WORKS, so it doesn't look like an issue with the required basics.
Hi @caseybea,
Welcome to GATK! We'll do what we can to help you set up the playroom for your users
Although the first thing I'm going to do is punt on your question, because we don't use SGE ourselves, and the job runner is mostly the result of external contributions iirc. We have a few users here who do have much more experience with it than us, particularly @Johan_Dahlberg who has submitted patches to the drmaa job runner. Hopefully he (or others) might have a minute to jump in and perhaps shed some light on the behavior you're seeing.
Hm. I may have jumped the gun. Before I even introduce the jobrunner stuff, I thought QUEUE was working to completion. Not so sure now?
This is what I get when running the hello-world example, no queue runner:
The only output I see is the HelloWorld.jobreport.txt file, and all that's in it is the following. I don't actually see output?:
Run it with
-startFromScratch
. You've run it successfully once, and Queue noted that (with an empty file called .SOMETHING.done). When you reran, it saw that it earlier success and didn't bother running the job (notice that immediately after Running jobs it claimed success)Ah! OK, that fixed my intermediary issue, I can once again verify that this works without the jobrunner (thank you!!).
I'm now back to my original issue, hoping someone can shed light. I also did verify the variety of "qsub" examples as shown in the gatk/queue debugging web page all work fine.
Hi everyone! I really appreciate the couple of tips added above-- but I and sadly still trying to figure out why the job(s) don't actually execute in SGE. If anyone that is familiar with this can assist, that would be awesome. I promise to followup with personal notes and observations about how it all works. @Johan_Dahlberg - might you be able to take a moment to view the error? Our entire sequencing core here is totally excited about the possibility of getting GATK to operate across multiple nodes!
I will post the error below here formatted in a cleaner way for easy reading. In reviewing my original post, it's a mess (sorry about that!)
Sorry for the very late answer. I've been to busy with other stuff to drop by here. Looking at the error above I'd say that:
org.ggf.drmaa.InternalException: Error reading answer list from qmaster
is the key solving this problem - however I did some quick looking around for something to shed some light on this and the only things I found were 2 C source files (and since I don't read C it din't help me very much).Here are the links if anyone can make sense of it:
https://github.com/gridengine/gridengine/blob/master/source/libs/japi/msg_japi.h
http://arc.liv.ac.uk/repos/hg/sge/source/libs/japi/japi.c
Searching for "MSG_JAPI_BAD_GDI_ANSWER_LIST" in the first link should take you were you want to go. But as I said, since my C almost nonexistent I can't really figure out in any detail what the method does.
My ideas on how to move forward with this would be to:
1)
Add a
Thread.sleep(120*1000)
to the HelloWorld script to get the script to stay on the node for 2 minutes (or as long as you need), and see if it pops up in the job "running jobs list". Since I don't use GE myself I can't provide a command to do this, but I guess that there is some equivalent ofsqueue
in SLURM. If it doesn't show up in the list then at least you can conclude that It's not being sent to a node by GE.2)
Have a look in the code here:
https://github.com/broadgsa/gatk-protected/blob/2a7af4316478348f7ea58e0803b3391593d6dbd6/public/scala/src/org/broadinstitute/sting/queue/engine/gridengine/GridEngineJobRunner.scala
To see if all arguments that you would normally need to set when running jobs manually are being set correctly. This was my problem when first starting to use Queue. Since out cluster enforces that a time argument has to be sent with the job, and Queue didn't give one, my jobs were not sent to the queue.
3)
If you don't get that to work try running Queue with
-jobRunner Drmaa -jobNative <what ever args you need>
and see if that works better. I run on SLURM using only the default Drmaa jobRunner and that works great.As I said, sorry for the late answer. Please let me know if there's anything more I can help with.
Also: Adding
-l DEBUG
to the command line might turn up some more information on what's going on.I know it's a year old, but I also came across libdrmaa issues. Aside from the excellent advice above, I would recommend the obvious check to see if you are trying to submit a Queue job from within a node on your cluster. Some clusters restrict submitting jobs that uses the libdrmaa scheduler from within a node (even if you make the path to the lib explicit with the Djava.library.path=[path to libdrmaa]. Solution is to try submitting the job from the headnode.