To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

GATK-Queue: script fails to call multiple processors


I adapted the example UnifiedGenotyper script to run HaplotypeCaller. Unfortunately, so far I haven't been able to get it to use multiple CPUs. I am using a server with 64 cores (AMD Opteron) and 512 gb RAM. Invoking the -nct option made no difference either. The example job is a highly heterogeneous set of 6 samples with a 270 mbp reference. Predicted runtime: 6 weeks (and we will need to genotype batches of up to 40 samples...). The script creates ten output directories (/.queue/scatterGather/hc-1-sg/temp_01_of_10, etc), but files are only produced in one of them.
The local guru cannot spot the problem either. I attach the script - I will appreciate some feedback.

As a sidenote, does the memoryLimit refer to the entire process, or is it per core?

Many thanks,


/queue -S hc.scala -R /whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta -I test_bams.list -NCT 4 -SCC 20 -MBQ 20 -H 0.1 -IH 0.001 -O test2.10cpu.4nct.output.vcf -run


//kk443 AT 30/05/2014 Script to run the GATK HaplotypeCaller with Queue parallelism.

package org.broadinstitute.sting.queue.qscripts.examples
import org.broadinstitute.sting.queue.QScript
import org.broadinstitute.sting.queue.extensions.gatk._

class ExampleUnifiedGenotyper extends QScript {

qscript =>

// Required arguments. All initialized to empty values.

@Input(doc="The reference file for the bam files.", shortName="R")
var referenceFile: File = _ // _ is scala shorthand for null

@Input(doc="Bam file to genotype.", shortName="I")
var bamFile: File = _

var hets: Float = _

var indelHeterozygosity: Float = _

var o: File = _

// The following arguments are all optional.

@Argument(shortName="SCC", required=false)
var stand_call_conf: Int = _

@Argument(shortName="MBQ", required=false)
var mbq: Int = _

@Argument(shortName="NCT", required=false)
var nct: Int = _

// This trait allows us set the variables below in one place,
// and then reuse this trait on each CommandLineGATK function below.
trait HaplotypeCallerArguments extends CommandLineGATK {
this.reference_sequence = qscript.referenceFile
this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals)
// Set the memory limit to 2 gigabytes on each command.
this.memoryLimit = 40

def script() {
// Create the four functions that we may run depending on options.
val genotyper = new HaplotypeCaller with HaplotypeCallerArguments

genotyper.scatterCount = 10
genotyper.input_file :+= qscript.bamFile
genotyper.out = swapExt(qscript.bamFile, "bam", "unfiltered.vcf")




INFO 23:27:36,528 QScriptManager - Compiling 1 QScript
INFO 23:27:44,663 QScriptManager - Compilation complete
INFO 23:27:44,806 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,806 HelpFormatter - Queue v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:09
INFO 23:27:44,807 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 23:27:44,807 HelpFormatter - For support and documentation go to
INFO 23:27:44,808 HelpFormatter - Program Args: -S hc.scala -R /whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta -I test_bams.list -NCT 2 -SCC 20 -MBQ 20 -H 0.1 -IH 0.001 -O test4.20cpu.2nct.200GB.output.vcf -run
INFO 23:27:44,809 HelpFormatter - Executing as kk443@bonobo--bio on Linux 3.2.0-60-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
INFO 23:27:44,809 HelpFormatter - Date/Time: 2014/06/01 23:27:44
INFO 23:27:44,810 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,810 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,818 QCommandLine - Scripting ExampleUnifiedGenotyper
INFO 23:27:44,969 QCommandLine - Added 1 functions
INFO 23:27:44,971 QGraph - Generating graph.
INFO 23:27:44,994 QGraph - Generating scatter gather jobs.
INFO 23:27:45,058 QGraph - Removing original jobs.
INFO 23:27:45,061 QGraph - Adding scatter gather jobs.
INFO 23:27:45,282 QGraph - Regenerating graph.
INFO 23:27:45,362 QGraph - Running jobs.
INFO 23:27:45,512 FunctionEdge - Starting: 'java' '-Xmx40960m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '' '-cp' '/whale-data/biosoft/src/Queue-3.1-1/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'HaplotypeCaller' '-I' '/whale-data/kk443/mapping/AAAfinished/test_bams.list' '-L' '/whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/scatter.intervals' '-R' '/whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta' '-o' '/whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/test_bams.listunfiltered.vcf'
INFO 23:27:45,512 FunctionEdge - Output written to /whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/test_bams.listunfiltered.vcf.out

Best Answer


Sign In or Register to comment.