Our documentation websites are currently offline due to a data center fire. We do not yet have an ETA for restoring service; we’ll update this message when we know more.

GATK-Queue: script fails to call multiple processors

Hello,

I adapted the example UnifiedGenotyper script to run HaplotypeCaller. Unfortunately, so far I haven't been able to get it to use multiple CPUs. I am using a server with 64 cores (AMD Opteron) and 512 gb RAM. Invoking the -nct option made no difference either. The example job is a highly heterogeneous set of 6 samples with a 270 mbp reference. Predicted runtime: 6 weeks (and we will need to genotype batches of up to 40 samples...). The script creates ten output directories (/.queue/scatterGather/hc-1-sg/temp_01_of_10, etc), but files are only produced in one of them.
The local guru cannot spot the problem either. I attach the script - I will appreciate some feedback.

As a sidenote, does the memoryLimit refer to the entire process, or is it per core?

Many thanks,
Chris

Command:

/queue -S hc.scala -R /whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta -I test_bams.list -NCT 4 -SCC 20 -MBQ 20 -H 0.1 -IH 0.001 -O test2.10cpu.4nct.output.vcf -run

Script:

//kk443 AT cam.ac.uk 30/05/2014 Script to run the GATK HaplotypeCaller with Queue parallelism.

package org.broadinstitute.sting.queue.qscripts.examples
import org.broadinstitute.sting.queue.QScript
import org.broadinstitute.sting.queue.extensions.gatk._

class ExampleUnifiedGenotyper extends QScript {

qscript =>

// Required arguments. All initialized to empty values.

@Input(doc="The reference file for the bam files.", shortName="R")
var referenceFile: File = _ // _ is scala shorthand for null

@Input(doc="Bam file to genotype.", shortName="I")
var bamFile: File = _

@Argument(shortName="H")
var hets: Float = _

@Argument(shortName="IH")
var indelHeterozygosity: Float = _

@Output(shortName="O")
var o: File = _

// The following arguments are all optional.

@Argument(shortName="SCC", required=false)
var stand_call_conf: Int = _

@Argument(shortName="MBQ", required=false)
var mbq: Int = _

@Argument(shortName="NCT", required=false)
var nct: Int = _

// This trait allows us set the variables below in one place,
// and then reuse this trait on each CommandLineGATK function below.
trait HaplotypeCallerArguments extends CommandLineGATK {
this.reference_sequence = qscript.referenceFile
this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals)
// Set the memory limit to 2 gigabytes on each command.
this.memoryLimit = 40
}

def script() {
// Create the four functions that we may run depending on options.
val genotyper = new HaplotypeCaller with HaplotypeCallerArguments

genotyper.scatterCount = 10
genotyper.input_file :+= qscript.bamFile
genotyper.out = swapExt(qscript.bamFile, "bam", "unfiltered.vcf")

add(genotyper)

}
}

StdOut:

INFO 23:27:36,528 QScriptManager - Compiling 1 QScript
INFO 23:27:44,663 QScriptManager - Compilation complete
INFO 23:27:44,806 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,806 HelpFormatter - Queue v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:09
INFO 23:27:44,807 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 23:27:44,807 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 23:27:44,808 HelpFormatter - Program Args: -S hc.scala -R /whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta -I test_bams.list -NCT 2 -SCC 20 -MBQ 20 -H 0.1 -IH 0.001 -O test4.20cpu.2nct.200GB.output.vcf -run
INFO 23:27:44,809 HelpFormatter - Executing as kk443@bonobo--bio on Linux 3.2.0-60-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
INFO 23:27:44,809 HelpFormatter - Date/Time: 2014/06/01 23:27:44
INFO 23:27:44,810 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,810 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,818 QCommandLine - Scripting ExampleUnifiedGenotyper
INFO 23:27:44,969 QCommandLine - Added 1 functions
INFO 23:27:44,971 QGraph - Generating graph.
INFO 23:27:44,994 QGraph - Generating scatter gather jobs.
INFO 23:27:45,058 QGraph - Removing original jobs.
INFO 23:27:45,061 QGraph - Adding scatter gather jobs.
INFO 23:27:45,282 QGraph - Regenerating graph.
INFO 23:27:45,362 QGraph - Running jobs.
INFO 23:27:45,512 FunctionEdge - Starting: 'java' '-Xmx40960m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/whale-data/kk443/mapping/AAAfinished/.queue/tmp' '-cp' '/whale-data/biosoft/src/Queue-3.1-1/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'HaplotypeCaller' '-I' '/whale-data/kk443/mapping/AAAfinished/test_bams.list' '-L' '/whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/scatter.intervals' '-R' '/whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta' '-o' '/whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/test_bams.listunfiltered.vcf'
INFO 23:27:45,512 FunctionEdge - Output written to /whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/test_bams.listunfiltered.vcf.out

Best Answer

Answers

Sign In or Register to comment.