Complete this survey about your research needs and be entered to win an Amazon gift card or FireCloud credit.
Download the latest Picard release at
GATK version 4.beta.5 is out. See the GATK4 beta page for download and details.

Scatter/Gather utilizing multiple cores in the same node/job

jesseerdmannjesseerdmann Member
edited April 2013 in Ask the GATK team

At the Minnesota Supercomputing Institute, our environment requires that jobs on our high performance clusters reserve an entire node. I have implemented my own Torque Manager/Runner for our environment based on the Grid Engine Manager/Runner. The way I have gotten this to work in our environment is to set the nCoresRequest for the scatter/gather method to the minimum required of eight. My understanding is that for the InDelRealigner, for example, the job reserves a node with eight cores, but only uses one. That means our users would have their compute time allocation consumed eight times faster than is necessary.

What I am wondering is are there options that I am missing where some number of the scatter/gather requests can be grouped into a single job submission? If I were writing this as a PBS script for our environment and I wanted to use 16 cores in a scatter/gather implementation, I would write two jobs, each with eight commands. They would look something like the following:

#PBS Job Configuration stuff
pbsdsh -n 0 java -jar ... &
pbsdsh -n 1 java -jar ... &
pbsdsh -n 2 java -jar ... &
pbsdsh -n 3 java -jar ... &
pbsdsh -n 4 java -jar ... &
pbsdsh -n 5 java -jar ... &
pbsdsh -n 6 java -jar ... &
pbsdsh -n 7 java -jar ... &

Has anyone done something similar in Queue? Any pointers? Thanks in advance!


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    Unfortunately, for tools like IndelRealigner that cannot be multi-threaded, there is nothing we can do for you, since the nCoresRequest parameter is a property of your cluster software. There is no built-in option to do what you want within Queue. Perhaps other users may suggest a workaround...

  • ryanabashbashryanabashbash Oak Ridge National LaboratoryMember

    I seem to have run into something similar when trying to run a QScript on a cluster. When writing the QScript, I tested it on a single machine that was running SGE, and it happily ran many Scatter/Gather jobs on the single node in parallel. However, once I moved over to the cluster (running UGE), it's happy to run multiple BWA jobs from the QScript on a single node, but it only launches one GATK job (e.g. IndelRealigner) on a single node.

    Here is the class I use for BWA alignment that has no problem launching multiple instances on a single node in the cluster.

    class BWAalignment(BWApath: File, BWAnumThreads: Int, readGroup: String, BWAindex: File, input: File, output: File) extends CommandLineFunction {
       @Input(doc="Input FASTQ")
       var inputFastq: File = input
       @Output(doc="Output SAM")
       var outputSAM: File = output
       this.jobName = "BWA_alignment"
       this.analysisName = "BWA_alignment"
       this.nCoresRequest = BWAnumThreads
       this.residentRequest = 6000
       this.jobEnvironmentNames = List("mpi " + BWAnumThreads)  //get smp_pe not valid errors without this.
       def commandLine = {
          BWApath + " mem " + " -t " + BWAnumThreads + " -R \"" + readGroup + "\" " + BWAindex + " " + inputFastq + " > " + outputSAM

    This is what I use a little later on in the script to define the IndelRealigner, but only one job ever launches on a single node.

    var indelRealignment = new IndelRealigner
    indelRealignment.reference_sequence = referenceFile
    indelRealignment.input_file = List(PicardOutput)
    indelRealignment.targetIntervals = targetCreation.out
    indelRealignment.out = swapExt(PicardOutput, "bam", "realigned.bam")
    indelRealignment.scatterCount = GATKnumScatter
    indelRealignment.jobEnvironmentNames = List("mpi " + "1")  //get smp_pe not valid errors without this.

    Given that the CommandLineFunction is able to launch multiple instances on a single node (e.g. 4 instances of BWA, each consuming 4 processors on a 16 processor node), is there a way to launch multiple IndelRealigners on a single node (e.g. 16 instances of IndelRealigner on a 16 processor node)? It seems I'm fundamentally misunderstanding something since things worked fine on the single machine running SGE.

Sign In or Register to comment.