The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Job names in Queue using DRMAA job runner

Member
edited January 2013

I've been running Queue using the DRMAA, and I've noticed one thing which I would like to bring up for discussion. The job names are generated using the following code at this point:

 // Set the display name to < 512 characters of the description
// NOTE: Not sure if this is configuration specific?
protected val jobNameLength = 500
protected val jobNameFilter = """[^A-Za-z0-9_]"""
protected def functionNativeSpec = function.jobNativeArgs.mkString(" ")

def start() {
session.synchronized {
val drmaaJob: JobTemplate = session.createJobTemplate

drmaaJob.setJobName(function.description.take(jobNameLength).replaceAll(jobNameFilter, "_"))
[...]


For me this yields names looking something like this:

"__java_____Xmx3072m_____D"

This is not very useful for telling the jobs apart. I'm running my jobs via drmaa on a system using the SLURM resource manager. So the cut-off in the name above can be attributed to the slurm system cutting of the name. Even so, I think that there should be more reasonable ways to create the name - using the function.jobName for example.

So, this leads me to my question - is there any particular reason that the job names are generated the way they are? And if not, do you (the gatk team) want a patch changing this to using the funciton.jobName instead?

Furthermore I would be interested in hearing from other users using gatk queue over drmaa, since I think it might be interesting to develop this further. I have as an example implemented setting a had to implement setting a hard wall time in the jobRunner, since the cluster I'm running on demands this. I'm sure that there are more solutions like that out there, and I would be thrilled to hear about them.

Post edited by Geraldine_VdAuwera on
Tagged:

• Dev

The job names as command lines are less and less important for display purposes when checking job status with programs like qstat and bjobs.

The biggest example is that we even considered using job arrays for submission. At that point if the jobs are all identified in the farm as something like 'queue_job[102]' then the current command line and even .jobName would be impossible to display. Instead Queue could log both the 'queue_job[102]' submission name and the full command line. That would at least allow one to figure out which job is which by referencing the Queue logs.

That said to answer your question about why the generated names are the way they are, we started Queue as a wrapper around LSF. When you submit a job to LSF bsub takes the command line and makes that the job name up to ~4000 characters. When Queue began development emulating this behavior was very helpful for debugging. As we added GridEngine support we were also able to verify that the ~1000 character job names were what we expected.

In the meantime I personally often use bsub output to quickly figure out what jobs are running or sometimes suspended in our LSF cluster. I expect that one day with job arrays I will have to take the extra step of going back to the Queue logs or even providing a utility program. Until submitted job names are completely useless though we've been just leaving the short truncated names the way that that they are in GridEngine/DRMAA. But if you have a patch I'd be more than welcome to take a look.

Re: the hard wall time-- if you have a patch that adds a jobRunLimit to QFunction/QSettings/JobRunners that's a feature present in most farms that we could include as an option for the QScript authors and Queue users. I doubt we would use the functionality but perhaps others will.

Re: SLURM-- while I can't test the jobRunner like we do with GridEngine and LSF, if you had a extension of the DRMAA JobRunner with specific QFunction/CommandLineFunction mappings that reduced the amount of Queue command line options one had to include I'd be happy to review that patch as well. It could end up benefiting other users who would like to use Queue with SLURM.

Thanks!

• Member

A colleague of mine has been running the Queue using DRMAA on Condor but had to make some changes to the code to make this possible. In the future we are will be transitioning to SLURM and I would be very interested in the changes Johan has made. In particular, I have heard from the person responsible for the cluster (using SLURM) that the hard wall time could be an issue for him.

I was also wondering, are you using GATK 1.x or 2.x and will queue remain open source in 2.x?

• Member

I wrote the staring post of this discussion, read kshakirs answer, got other priorities and then the whole thing sort of slipped my mind.

On the subject of the job names I have written a patch for the job names (however only for the DrmaaJobRunner), but I haven't had time to test it properly yet. I will get back to you once I have.

Concerning the QFunction/CommandLineFuncion mappings I will make sure to collect my changes into a patch there as well. Since I have done this by looking at the existing code and trying to replicate it I'm sure that there might be some stuff in there that are not fully up to standards - so any help reviewing that would be much appreciated.

@TimHughes I would be happy to share my experience of running Queue with SLURM/DRMAA, and of course share any code I have. If you want to dive straight in you can checkout my gatk fork at https://github.com/johandahlberg/gatk/tree/devel - note that you need to look at the devel branch, since the master branch is just my copy of the main gatk repo.

I'm not sure if the last question was aimed at me, or the gatk team. But assuming that you were wondering what I've been using, I'm using GATK Lite 2.x (if I've got the terminology right), the open source version of the current source code. Furthermore if I've understood things correctly queue will stay open source in the future...

I'd be very much interested in incorporating patches to Queue for any other job execution engines, so please do contribute. We intend -- like with the GATK -- that the framework itself will remain open source, so that anyone can use it to run their own scripts, but that potentially some (to be fair, this is currently none) scripts would be premium tools put into the full release only.

• Member

Now I've gotten around to formatting my patch for the job walltime in the drmaa jobrunner. I tried to attach it to the post, but the format wasn't allowed, so I'm pasting it below. I can of course also send it by email if any ones interested. Any comments are very welcome, there may be a lot of better ways to achieve this end, and if so I would be happy to hear about them.

From c696ecf2d36b524e1842d67f54c67961546967aa Mon Sep 17 00:00:00 2001
From: Johan Dahlberg <johan.dahlberg@medsci.uu.se>
Date: Fri, 28 Sep 2012 14:56:08 +0200
Subject: [PATCH] Setting the walltime in the Drmaa jobrunner

---
.../sting/queue/engine/drmaa/DrmaaJobRunner.scala  |    3 +++
.../sting/queue/function/CommandLineFunction.scala |   10 ++++++++++
3 files changed, 17 insertions(+)

index 1a50301..bae3bde 100644
@@ -31,6 +31,10 @@ import org.broadinstitute.sting.commandline.Argument
* Default settings settable on the command line and passed to CommandLineFunctions.
*/
class QSettings {
+
+  @Argument(fullName="job_walltime", shortName="wallTime", doc="Setting the required walltime when using the drmaa job runner.", required=false)
+  var jobWalltime: Option[Long] = None
+
@Argument(fullName="run_name", shortName="runName", doc="A name for this run used for various status messages.", required=false)
var runName: String = _

index 2aae2fc..31b314c 100644
@@ -65,6 +65,9 @@ class DrmaaJobRunner(val session: Session, val function: CommandLineFunction) ex
drmaaJob.setJoinFiles(true)
}

+      if(function.wallTime != null)
+         drmaaJob.setHardWallclockTimeLimit(function.wallTime.get)
+
drmaaJob.setNativeSpecification(functionNativeSpec)

// Instead of running the function.commandLine, run "sh <jobScript>"
index 84b6257..66e51b3 100644
@@ -32,6 +32,9 @@ import org.broadinstitute.sting.queue.util._
trait CommandLineFunction extends QFunction with Logging {
def commandLine: String

+  /** Setting the wall time request for drmaa job*/
+  var wallTime: Option[Long] = None
+
/** Upper memory limit */
var memoryLimit: Option[Double] = None

@@ -63,6 +66,9 @@ trait CommandLineFunction extends QFunction with Logging {
super.copySettingsTo(function)
function match {
case commandLineFunction: CommandLineFunction =>
+        if(commandLineFunction.wallTime.isEmpty)
+          commandLineFunction.wallTime = this.wallTime
+
if (commandLineFunction.memoryLimit.isEmpty)
commandLineFunction.memoryLimit = this.memoryLimit

@@ -106,6 +112,10 @@ trait CommandLineFunction extends QFunction with Logging {
* Sets all field values.
*/
override def freezeFieldValues() {
+
+    if(wallTime.isEmpty)
+      wallTime = qSettings.jobWalltime
+
if (jobQueue == null)
jobQueue = qSettings.jobQueue

--
1.7.9.5


Hi Johan, thanks for sharing this! We'll have a look and see if we can add this to the codebase. To that end, could you please check that your patch conforms to the patch submission instructions then email it to me at vdauwera@broadinstitute.org? Thanks!

• Member

I think that it does follow the guidelines, but if it does not, please tell me and I will try to fix it. I have emailed you the patch now.

Hi Johan, I'm glad to report that we've finally got around to integrating your walltime patch into the codebase! It will be available in the next release (2.3). Thanks for your contribution!

• Member

My memory is not very long it seems. I submitted a pull request today, here: https://github.com/broadgsa/gatk-protected/pull/3 regarding the job names and then I realized that I had actually raised this question before. However, now I've submitted a patch to set the job names in what to me seems to be a more reasonable way. I would be very happy if somebody would take a look at the pull request and come back to me with comments if it's a good idea or not.

• Member, Dev

Actually, this issue just came up for me as well. One of my tasks today was to try to figure out how to set the job name to the analysis name for our LSF cluster - I haven't looked at the code yet, but perhaps this could be implemented in a higher-level class and controlled by a command-line parameter?

• Member

If I understand it correctly this could be achieved in a analogous way to how it's done in the drmaa job runner by changing:

request.jobName = function.description.take(LibBat.MAX_JOB_NAME_LEN)


to:

request.jobName = function.analysisName.take(LibBat.MAX_JOB_NAME_LEN)


in Lsf706JobRunner. The analysis name can then be set via the QScript in any way you like.

Personally I have a setup where I using something like this:

this.analysisName = projectName.get + "_bwaSamPe"


to see which project and step in the analysis is running.

As far as I can see, there would be no easy way to move this logic to the CommandLineJobRunner trait (which is the super trait of all other job runners), as each type of job runner interacts with it's cluster system in a different fashion. However, here someone more knowledgeable in the subject than I might have good idea on how to achieve that.

• Member, Dev

I think your understanding of the LsfJobRunner is spot on. However, I also think this whole conversation is revolving around trading one problem for another (there's probably some awesome idiom for that in French that Geraldine can tell us about. Bonus points for including cabbage-based foods!).

There are three descriptive fields in QFunction (description, analysisName, and shortDescription). I think the right approach is to either decide which field to use as the job name - forever, no matter what runner - or to provide users an easy way to switch among them. From what I've been able to glean, the default values are:

1. analysisName: Set to the literal string '<function>' in QFunction, overridden to the name of the walker/Picard class in the extensions.
2. description: Set to analysisName: InputFileList > OutputFileList in QFunction, overridden to the command line in CommandLineFunction
3. shortDescription: Set to analysisName: FirstOutputFile in QFunction

Obviously, any of these can be overridden in the class of the final Scala script. So I would argue two things: (1) All runners should use the same field for the job name, and (2) As I understand your use case (and mine as well), you could achieve your desired result by overriding description instead of analysisName.

My feeling from reading the source is that @kshakir intended for 'analysisName' to be a description of the class, not the job (or object, if you want to think in OO terms), while 'description' and 'shortDescription' describe the job itself. Based on this, I would argue that the best jobName is shortDescription

• Member

(1) All runners should use the same field for the job name, and (2) As I understand your use case (and mine as well), you could achieve your desired result by overriding description instead of analysisName.

Agreed.

Based on this, I would argue that the best jobName is shortDescription.

Yes. But I would like to add that I think that an even better solution is to add a jobRunnerJobName field (as jobName is already taken) which defaults to shortDescription. That way it might be easier for newcomers (such as myself) to see the logic of that's what.

This sounds reasonable to me -- I'll have someone look at your pull request. Is it up to date with what you mention here?

• Member, Dev

It doesn't look like the pull request has the latest changes we've been talking about, but I'm testing a patch locally that I'll post shortly

Meanwhile we're discussing internally what is the best way to get code changes from you folks, as we don't use the public repo itself for development, and typically it's easier for us to work with patches than with pull requests to that repo. We only just realized we had open pull requests to that repo thanks to Johan's post. Sorry for the inconvenience, we'll try to figure out a workflow asap.

• Member, Dev

Here it is, I've tested that it does what I expect in my LSF environment. Geraldine, should I also email this to you?

From dfd63dd5e381867ea45b8868db120596bac9383d Mon Sep 17 00:00:00 2001
From: Phillip Dexheimer <phillip.dexheimer@cchmc.org>
Date: Tue, 12 Nov 2013 11:42:09 -0500
Subject: [PATCH] Changed name of jobs submitted to cluster job runners

-- Added 'jobRunnerJobName' definition to QFunction, defaults to value of shortDescription
-- Edited Lsf and Drmaa JobRunners to use this string instead of description for naming jobs in the scheduler
---
3 files changed, 7 insertions(+), 2 deletions(-)

index 9cfd692..c7e569d 100644
@@ -50,7 +50,7 @@ class DrmaaJobRunner(val session: Session, val function: CommandLineFunction) ex
session.synchronized {
val drmaaJob: JobTemplate = session.createJobTemplate

-      drmaaJob.setJobName(function.description.take(jobNameLength).replaceAll(jobNameFilter, "_"))
+      drmaaJob.setJobName(function.jobRunnerJobName.take(jobNameLength).replaceAll(jobNameFilter, "_"))

// Set the current working directory
drmaaJob.setWorkingDirectory(function.commandDirectory.getPath)
index 1140c49..19c6f9b 100644
@@ -71,7 +71,7 @@ class Lsf706JobRunner(val function: CommandLineFunction) extends CommandLineJobR
for (i <- 0 until LibLsf.LSF_RLIM_NLIMITS)
request.rLimits(i) = LibLsf.DEFAULT_RLIMIT;

-      request.jobName = function.description.take(LibBat.MAX_JOB_NAME_LEN)
+      request.jobName = function.jobRunnerJobName.take(LibBat.MAX_JOB_NAME_LEN)
request.options |= LibBat.SUB_JOB_NAME

// Set the output file for stdout
@@ -149,6 +149,11 @@ trait QFunction extends Logging with QJobReport {
case _ => analysisName
}
}
+
+  /**
+   * The name of the job as submitted to the job runner
+   */
+  def jobRunnerJobName = shortDescription

/**
* Returns true if the function is done.
--
1.7.11.1


Sure, please email it to me as an attachment if you don't mind. I believe you have my email already?

Great, thanks guys. The patch has been added to our development repository. It will be in the publicly available source when we next release.

• Member

I will make sure to test the patch with the drmaa job runner tomorrow, and close the pull request on github. Thanks, @pdexheimer for the fruitful discussion.

@Geraldine, what's the approximate time till the next release? I'm wondering because I considering how to integrate this into my own release schedule. Also, updated guides for contributing to the GATK and queue would be great.

• Member, Dev

I'm happy to have contributed, @Johan_Dahlberg - it was somewhat serendipitous that you resurrected this thread when you did. My solution to the problem would likely have been a local hack rather than a proper fix

• Member

I can now confirm that this is working as expected on the drmaa jobrunner.

@Johan_Dahlberg, we don't have an ETA yet for the next release; we're working on two big pushes to improve HaplotypeCaller and VQSR, and it's difficult to estimate when they'll be done. I would guess at least a couple more weeks. I'll try to give a heads up when we have a more precise idea.

Will get to work on those guides for contributions

By the way, if you have a GATK-based software package of your own (e.g. your own walkers etc) that you'd like to distribute to the community, we'd be happy to help you with that. Just let me know and we can discuss options.

• Member

hi @Johan_Dahlberg,
we're also in the process of replacing Torque with Slurm, and I'm testing Queue with Drmaa.
Obviously very interested in this fix to the job names you submitted (thanks for that!).

I'm only familiar with basics in scala, and I was trying to use your function by writing

this.jobRunnerJobName = "QueueRecal"


in a class within my data processing pipeline (class for recalibrating the bam files) that extends PrintReads with CommandLineGATKArgs.
but I get an error, saying jobRunnerJobName is not a member of that.

clearly I'm writing something wrong in the code
In which class/trait or other part of the code should you set this property, in order to set a different name in each GATK step?

thanks!
Francesco

• Member, Dev

@flescai‌, could you provide a little more context? What you describe should work, maybe the definition of the class in question would help to clear up the error. Also, make sure you're using a recent enough version - the patch went in at version 2.8, I think

• Member

Sure @pdexheimer‌
I'll simplify the code by using ExampleCountReads.scala, slightly modified to show the two cases where I get this error.

        /*
*
* Permission is hereby granted, free of charge, to any person
* obtaining a copy of this software and associated documentation
* files (the "Software"), to deal in the Software without
* restriction, including without limitation the rights to use,
* copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following
* conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR
* THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/

/**
* An introductory pipeline for Queue.
* Runs the GATK CountReads individually and across a set of bams.
* All bams must have the same reference.
*/
@Input(doc="The reference file for the bam files.", shortName="R")
var referenceFile: File = null

// NOTE: Do not initialize List, Set, or Option to null
// as you won't be able to update the collection.
// By default set:
// List[T] = Nil
// Set[T] = Set.empty[T]
// Option[T] = None
@Input(doc="One or more bam files.", shortName="I")
var bamFiles: List[File] = Nil

/**
* In script, you create and then add() functions to the pipeline.
*/
def script() {

// Run CountReads for all bams jointly.

// Create a new CountReads from the Queue GATK Extensions.
// The names of walkers are the same as you would use for '-T <WalkerName>'

// Each field in the extensions is based off of the full form of the arguments.
// To get the list of arguments and their descriptions run
// java -jar <path to GenomeAnalysisTK.jar> -T <WalkerName> -help

// GATK inputs that take more than one file will have a singular name which
// matches the full form of the argument, but will actually be a scala List[]

// Set the memory limit. Also acts as a memory request on LSF and GridEngine.

// Add the newly created function to the pipeline.

// If there is more than one BAM, also run CountReads once for each bam.
if (bamFiles.size > 1) {
for (bamFile <- bamFiles) {
// ':+' is the scala List append operator
}
}
}

this.reference_sequence = referenceFile
this.input_file = bamFiles
this.jobNativeArgs = Seq("--mem=1000")
this.jobRunnerJobName = "QueueCount"
}

}


I tried to change the job name in Drmaa Slurm submission, using either

 val jointCountReads = new CountReads


or from within a class (which represents better the more general way I usually write the code)

          class myCounts extends CountReads {
this.reference_sequence = referenceFile
this.input_file = bamFiles
this.jobNativeArgs = Seq("--mem=1000")
this.jobRunnerJobName = "QueueCount"
}


but in both cases I get the error

        INFO  10:05:29,368 QScriptManager - Compiling 1 QScript
ERROR 10:05:30,684 QScriptManager - ExampleCountReads.scala:93: value jobRunnerJobName_= is not a member of ExampleCountReads.this.myCounts
ERROR 10:05:30,685 QScriptManager -     this.jobRunnerJobName = "QueueCount"
ERROR 10:05:30,686 QScriptManager -              ^
ERROR 10:05:30,694 QScriptManager - two errors found
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace
at org.broadinstitute.sting.queue.QCommandLine.org$broadinstitute$sting$queue$QCommandLine$$qScriptPluginManagerlzycompute(QCommandLine.scala:95) at org.broadinstitute.sting.queue.QCommandLine.orgbroadinstitutestingqueueQCommandLine$$qScriptPluginManager(QCommandLine.scala:93)
##### ERROR ------------------------------------------------------------------------------------------


Which clearly indicates from the thread above, following the patch of @Johan_Dahlberg‌ , I haven't understood which jobRunnerJobName is property of :-)
Would just appreciate some advice on how to set this property for different jobs.

cheers,
Francesco

• Member, Dev

Oh, I see the problem. "jobRunnerJobName" is not a field in QFunction, it's actually a method. Inside the subclass, put this:

override def jobRunnerJobName = "QueueCount"
`
• Member

fantastic! thanks!