The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# Queue with Grid Engine

edited February 2014

### 1. Background

Thanks to contributions from the community, Queue contains a job runner compatible with Grid Engine 6.2u5.

As of July 2011 this is the currently known list of forked distributions of Sun's Grid Engine 6.2u5. As long as they are JDRMAA 1.0 source compatible with Grid Engine 6.2u5, the compiled Queue code should run against each of these distributions. However we have yet to receive confirmation that Queue works on any of these setups.

Our internal QScript integration tests run the same tests on both LSF 7.0.6 and a Grid Engine 6.2u5 cluster setup on older software released by Sun.

If you run into trouble, please let us know. If you would like to contribute additions or bug fixes please create a fork in our github repo where we can review and pull in the patch.

## 2. Running Queue with GridEngine

Try out the Hello World example with -jobRunner GridEngine.

java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S public/scala/qscript/examples/HelloWorld.scala -jobRunner GridEngine -run


If all goes well Queue should dispatch the job to Grid Engine and wait until the status returns RunningStatus.DONE and "hello world should be echoed into the output file, possibly with other grid engine log messages.

See QFunction and Command Line Options for more info on Queue options.

## 3. Debugging issues with Queue and GridEngine

If you run into an error with Queue submitting jobs to GridEngine, first try submitting the HelloWorld example with -memLimit 2:

java -Djava.io.tmpdir=tmp -jar dist/Queue.jar -S public/scala/qscript/examples/HelloWorld.scala -jobRunner GridEngine -run -memLimit 2


Then try the following GridEngine qsub commands. They are based on what Queue submits via the API when running the HelloWorld.scala example with and without memory reservations and limits:

qsub -w e -V -b y -N echo_hello_world \
-o test.out -wd $PWD -j y echo hello world qsub -w e -V -b y -N echo_hello_world \ -o test.out -wd$PWD -j y \
-l mem_free=2048M -l h_rss=2458M echo hello world


One other thing to check is if there is a memory limit on your cluster. For example try submitting jobs with up to 16G.

qsub -w e -V -b y -N echo_hello_world \
-o test.out -wd $PWD -j y \ -l mem_free=4096M -l h_rss=4915M echo hello world qsub -w e -V -b y -N echo_hello_world \ -o test.out -wd$PWD -j y \
-l mem_free=8192M -l h_rss=9830M echo hello world

qsub -w e -V -b y -N echo_hello_world \
-o test.out -wd $PWD -j y \ -l mem_free=16384M -l h_rss=19960M echo hello world  If the above tests pass and GridEngine will still not dispatch jobs submitted by Queue please report the issue to our support forum. Post edited by Geraldine_VdAuwera on Geraldine Van der Auwera, PhD Tagged: ## Comments • Posts: 1Member You should use h_vmem instead of or along with mem_free for the qsub submission examples above. mem_free only checks memory usage at the time of first entering running status, which is OK for short-lived processes, but not for long-lived ones, where memory usage can grow over time. E.g. qsub -l h_vmem=16G,mem_free=16G ... • Broad InstitutePosts: 37Dev edited May 2013 You mean to use public/scala/qscript/org/broadinstitute/sting/queue/qscripts/examples/HelloWorld.scala right? • Posts: 27Member edited February 2014 Hello there, I've got an issue running scatter-gather on gridengine 6.2u5, redhat. When I first ran it, it reported libdrmaa.so missing, so I did a clusterwide search, and found the admins version libdrmaa.so. That meant that I could finally run basic hello world scripts, such as the below: $    java -Djava.io.tmpdir=$temp \ -jar$queu -jobRunner GridEngine \
-S $home/QUEUETools/newest/resources/ExampleUnifiedGenotyper.scala \ -R$home/QUEUETools/newest/resources/exampleFASTA.fasta \
-I $home/QUEUETools/newest/resources/exampleBAM.bam -run INFO 18:31:07,505 QScriptManager - Compiling 1 QScript INFO 18:31:13,574 QScriptManager - Compilation complete INFO 18:31:13,697 HelpFormatter - ---------------------------------------------------------------------- INFO 18:31:13,697 HelpFormatter - Queue v2.7-2-g6bda569, Compiled 2013/08/28 16:33:34 INFO 18:31:13,697 HelpFormatter - Copyright (c) 2012 The Broad Institute INFO 18:31:13,697 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 18:31:13,698 HelpFormatter - Program Args: -jobRunner GridEngine -S /xxx/QUEUETools/newest/resources/ExampleUnifiedGenotyper.scala -R /xxx/QUEUETools/newest/resources/exampleFASTA.fasta -I /xxx/QUEUETools/newest/resources/exampleBAM.bam -run INFO 18:31:13,698 HelpFormatter - Date/Time: 2014/02/03 18:31:13 INFO 18:31:13,698 HelpFormatter - ---------------------------------------------------------------------- INFO 18:31:13,699 HelpFormatter - ---------------------------------------------------------------------- INFO 18:31:13,708 QCommandLine - Scripting ExampleUnifiedGenotyper INFO 18:31:13,844 QCommandLine - Added 2 functions INFO 18:31:13,844 QGraph - Generating graph. INFO 18:31:13,872 QGraph - Generating scatter gather jobs. INFO 18:31:13,903 QGraph - Removing original jobs. INFO 18:31:13,907 QGraph - Adding scatter gather jobs. INFO 18:31:14,688 QGraph - Regenerating graph. INFO 18:31:14,706 QGraph - Running jobs. INFO 18:31:15,322 QGraph - 0 Pend, 0 Run, 0 Fail, 7 Done INFO 18:31:16,379 QCommandLine - Writing final jobs report... INFO 18:31:16,380 QJobsReporter - Writing JobLogging GATKReport to file /xxx/QUEUETools/Queue_2.7.2/resources/ExampleUnifiedGenotyper.jobreport.txt INFO 18:31:16,635 QJobsReporter - Plotting JobLogging GATKReport to file /xxx/QUEUETools/Queue_2.7.2/resources/ExampleUnifiedGenotyper.jobreport.pdf WARN 18:31:16,648 RScriptExecutor - Skipping: Rscript (resource)org/broadinstitute/sting/queue/util/queueJobReport.R /xxx/QUEUETools/Queue_2.7.2/resources/ExampleUnifiedGenotyper.jobreport.txt /xxx/QUEUETools/Queue_2.7.2/resources/ExampleUnifiedGenotyper.jobreport.pdf INFO 18:31:16,655 QCommandLine - Script completed successfully with 7 total jobs  So, that's fine. However, when I try to run basically the same script on actual BAM files, I get this error: $       java -Djava.io.tmpdir=$temp \ -jar$queu -jobRunner GridEngine \
-S $home/QUEUETools/newest/resources/ExampleUnifiedGenotyper.scala \ -R$dxfa \
-I $gatr/bamlists/currentrecalbams.test2.list -run blabla INFO 18:36:22,307 QGraph - Generating scatter gather jobs. INFO 18:36:22,338 QGraph - Removing original jobs. INFO 18:36:22,341 QGraph - Adding scatter gather jobs. INFO 18:36:23,164 QGraph - Regenerating graph. INFO 18:36:23,200 QGraph - Running jobs. INFO 18:36:27,499 FunctionEdge - Starting: LocusScatterFunction: List(/share/XFS0016/gata/bamlists/currentrecalbams.test2.list, /ifshk7/ST_PG/PMO/SZY11098/indx/GATKh19bundle/ucsc.hg19.fasta) > List(/ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_1_of_3/scatter.intervals, /ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_2_of_3/scatter.intervals, /ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_3_of_3/scatter.intervals) INFO 18:36:27,499 FunctionEdge - Output written to /ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/scatter/scatter.out INFO 18:36:28,067 QGraph - 6 Pend, 1 Run, 0 Fail, 0 Done INFO 18:36:58,383 FunctionEdge - Done: LocusScatterFunction: List(/share/XFS0016/gata/bamlists/currentrecalbams.test2.list, /ifshk7/ST_PG/PMO/SZY11098/indx/GATKh19bundle/ucsc.hg19.fasta) > List(/ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_1_of_3/scatter.intervals, /ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_2_of_3/scatter.intervals, /ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_3_of_3/scatter.intervals) INFO 18:36:58,387 QGraph - Writing incremental jobs reports... INFO 18:36:58,388 QJobsReporter - Writing JobLogging GATKReport to file /ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/ExampleUnifiedGenotyper.jobreport.txt INFO 18:36:58,610 FunctionEdge - Starting: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/share/XFS0016/temp' '-cp' '/xxx/QUEUETools/newest/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'UnifiedGenotyper' '-I' '/share/XFS0016/gata/bamlists/currentrecalbams.test2.list' '-L' '/ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_1_of_3/scatter.intervals' '-R' '/ifshk7/ST_PG/PMO/SZY11098/indx/GATKh19bundle/ucsc.hg19.fasta' '-o' '/ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_1_of_3/currentrecalbams.test2.listunfiltered.vcf' INFO 18:36:58,611 FunctionEdge - Output written to /ifshk5/PC_HUMAN_AP/PMO/SZY11098_HUMbjjR/QUEUETools/Queue_2.7.2/resources/.queue/scatterGather/ExampleUnifiedGenotyper-1-sg/temp_1_of_3/currentrecalbams.test2.listunfiltered.vcf.out **ERROR** 18:36:58,890 Retry - Caught error during attempt 1 of 4. org.broadinstitute.sting.queue.QException: Unable to submit job: error: no suitable queues blablabla  I know what value to enter in the queue field: the default queue test-command gives the same error:  qsub -w e -V -b y -N echo_hello_world -l vf=4G -o test.out -wd$PWD -j y echo hello world
Unable to run job: error: no suitable queues.
Exiting.


Which I can thusly correct:

   qsub -w e -V -b y -N echo_hello_world -l vf=5G -q st.q -P st_pg vf=4G -o test.out -cwd -j y echo hello world
Your job 990540 ("echo_hello_world") has been submitted


My question is, how do I edit the default parameters of drmaa / queue, to use my desired -q parameter? I can't edit .so files, it seems.

Hi there,

We don't work with DRMAA so I can't help you, but perhaps one of our resident superusers such as @pdexheimer or @Johan_Dahlberg will be able to jump in with an answer.

Geraldine Van der Auwera, PhD

• Posts: 543Member, Dev ✭✭✭✭

There's a global -jobQueue argument (i.e., java -jar Queue.jar -s script.scala -jobQueue st.q …), but it looks like the DRMAA runner never uses it. Unfortunately, I don't know anything about DRMAA either, so I don't know exactly how to make the fix

As a workaround you can try Queue's --jobNative argument (or the equivalent QFunction property .jobNativeArgs) to pass arguments directly to DRMAA.

Joel Thibault ~ Software Engineer ~ GSA ~ Broad Institute

• Posts: 96Member ✭✭✭

Yes. I can second @thibaults solution. However, it depends on then drmaa specification if it will work or not since they seem to handle the jobNative arguments quite differently.

• Posts: 27Member

That sounds very promising, actually.

I've narrowed it down, such that I actually will not need the qsub -q argument, all that I need is a -P argument (qsub -P st_pg).

However, I'm not sure how to syntax native_arg. When I write it like this:

java -Djava.io.tmpdir=$temp -jar$queu -jobRunner GridEngine -S ExampleCountReads.scala -R exampleFASTA.fasta -I exampleBAM.bam -jobNative -P st_pg -memLimit 2 -run

I get this:

INFO  14:20:11,924 QScriptManager - Compiling 1 QScript
INFO  14:20:17,726 QScriptManager - Compilation complete
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
Argument with name 'P' isn't defined.
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 2.7-2-g6bda569):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR
##### ERROR MESSAGE: Argument with name 'P' isn't defined.
##### ERROR ------------------------------------------------------------------------------------------


Can anybody guess what the argument format is supposed to be?

• Posts: 27Member
edited February 2014

Ah - I continued to play with it, and I just had to format the argument as a string:

-jobNative "-P st_pg -l vf=6G etc etc etc"

Thanks to @thibault and @Johan_Dahlberg, you guys are brilliant. I've got queue working and the pertinent jobs submitted.

Post edited by redzengenoist on
• Posts: 12Member

Hi,
queue in general works fine for me on the GridEngine. There is a little performance tweak I would like to suggest.
At the moment, the GridEngineJobRunner.scala forces "the remote environment to inherit local environment settings".
That might be a goo idea in general, to make sure the jobs get all they need, but with hundreds of clustered jobs, this unnecessarily
slows down the system. I'm not much of a scala programmer (yet), so I don't see a way to turn the -V flag off, other than doing it manually in the source code and compile the whole thing.
A nice thing would be the possibility to set the inheritance to false.
Maybe @pdexheimer or @Johan_Dahlberg know a solution?

• Posts: 543Member, Dev ✭✭✭✭

@DavidRies‌ - As you suggest, the -V parameter is always set for GridEngine jobs. You're right, at the moment you'd have to remove it in the code and recompile Queue to get rid of it.

The solution would be to add another argument to QSettings, then conditionally add -V to nativeSpec depending on the contents of that argument. However, adding a runner-specific argument to the global QSettings wouldn't be great - it should really be something applicable to any runner in general. I'm not certain exactly what -V does (beyond what's in the comment, of course), so I'm not sure if it's an easily generalizable concept

• Posts: 11Member

@Geraldine_VdAuwera - Running Queue for LSF jobs, how to pass the parameter like "-n 3"? I used -jobNative "-n 3", but seems that didn't work. Is there any way to do that? Thanks.

I'm not sure, @mxqian. I never use it that way. Hopefully someone else in this thread will jump in to help.

Geraldine Van der Auwera, PhD

• Posts: 543Member, Dev ✭✭✭✭

@mxqian

You would set the CommandLineFunction.nCoresRequest field. For example, in your case class for a particular job (like IndelRealigner or HaplotypeCaller), you would specify this.nCoresRequest = 3`

However, I would suggest that in practice it's generally better to increase the scatterCount than it is to run multi-threaded

• Posts: 11Member

@pdexheimer Great. Thank you so much.