Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Am I missing something with GridEngine? Jobs not Running in parallel

louwslouws San FranciscoMember

Hi!
I am happy to report that Queue and all the necessary tests for running GridEngine passed. The issue I am having is using a custom qscript to run a job in parallel. When I run the job on the cluster via qsub it runs in serial. Would someone be willing to look at my qsub syntax and my qscript to see if I am forgetting something?

The Qscript was a modified UnifiedGenotyper script configured to work with HaplotypeCaller:
` package org.broadinstitute.sting.queue.qscripts.examples

import org.broadinstitute.sting.queue.QScript
import org.broadinstitute.sting.queue.extensions.gatk._

class Haplotyper extends QScript {
  @Input(doc="The reference file for the bam files.", shortName="R")
  var referenceFile: File = _ // _ is scala shorthand for null

  @Input(doc="Bam file to genotype.", shortName="I")
  var bamFile: File = _

  @Input(doc="Output file.", shortName="o")
  var outputFile: File = _

  trait UnifiedGenotyperArguments extends CommandLineGATK {
    this.reference_sequence = qscript.referenceFile
    this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals)
    this.memoryLimit = 2
  }
  def script() {
   val genotyper = new HaplotypeCaller with UnifiedGenotyperArguments

  genotyper.scatterCount = 12
  genotyper.input_file :+= qscript.bamFile
  genotyper.out = swapExt(outputFile, qscript.bamFile, "bam", "vcf")

  add(genotyper) 
    }
}`

and my Queue syntax was:
java -Djava.io.tmpdir=tmp -jar /location/of/queue/Queue.jar -S scripts/qscalascripts/haplotyper.scala -R human_g1k_v37 -I /source/input_file -o /destination/output/file -l debug -jobRunner GridEngine -run

When I use the above, the Queue script breaks up my job into 12 discrete pieces, but runs it all on one node on the cluster. Any pointers is most welcome.

Answers

  • louwslouws San FranciscoMember

    I think I may have stumbled on a possible answer and it has nothing to do with Queue or qscala. I tested the syntax from within a node, and everything ran in parallel. It is when I submit this via the headnode on the clusted via qsub that it runs in serial. Thank you everyone for taking the time to take a look.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks for reporting back, and glad to hear that you've got it figured out.

Sign In or Register to comment.