Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK-Queue: script fails to call multiple processors

KrzysztofMKozakKrzysztofMKozak Cambridge, UKMember

Hello,

I adapted the example UnifiedGenotyper script to run HaplotypeCaller. Unfortunately, so far I haven't been able to get it to use multiple CPUs. I am using a server with 64 cores (AMD Opteron) and 512 gb RAM. Invoking the -nct option made no difference either. The example job is a highly heterogeneous set of 6 samples with a 270 mbp reference. Predicted runtime: 6 weeks (and we will need to genotype batches of up to 40 samples...). The script creates ten output directories (/.queue/scatterGather/hc-1-sg/temp_01_of_10, etc), but files are only produced in one of them.
The local guru cannot spot the problem either. I attach the script - I will appreciate some feedback.

As a sidenote, does the memoryLimit refer to the entire process, or is it per core?

Many thanks,
Chris

Command:

/queue -S hc.scala -R /whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta -I test_bams.list -NCT 4 -SCC 20 -MBQ 20 -H 0.1 -IH 0.001 -O test2.10cpu.4nct.output.vcf -run

Script:

//kk443 AT cam.ac.uk 30/05/2014 Script to run the GATK HaplotypeCaller with Queue parallelism.

package org.broadinstitute.sting.queue.qscripts.examples
import org.broadinstitute.sting.queue.QScript
import org.broadinstitute.sting.queue.extensions.gatk._

class ExampleUnifiedGenotyper extends QScript {

qscript =>

// Required arguments. All initialized to empty values.

@Input(doc="The reference file for the bam files.", shortName="R")
var referenceFile: File = _ // _ is scala shorthand for null

@Input(doc="Bam file to genotype.", shortName="I")
var bamFile: File = _

@Argument(shortName="H")
var hets: Float = _

@Argument(shortName="IH")
var indelHeterozygosity: Float = _

@Output(shortName="O")
var o: File = _

// The following arguments are all optional.

@Argument(shortName="SCC", required=false)
var stand_call_conf: Int = _

@Argument(shortName="MBQ", required=false)
var mbq: Int = _

@Argument(shortName="NCT", required=false)
var nct: Int = _

// This trait allows us set the variables below in one place,
// and then reuse this trait on each CommandLineGATK function below.
trait HaplotypeCallerArguments extends CommandLineGATK {
this.reference_sequence = qscript.referenceFile
this.intervals = if (qscript.intervals == null) Nil else List(qscript.intervals)
// Set the memory limit to 2 gigabytes on each command.
this.memoryLimit = 40
}

def script() {
// Create the four functions that we may run depending on options.
val genotyper = new HaplotypeCaller with HaplotypeCallerArguments

genotyper.scatterCount = 10
genotyper.input_file :+= qscript.bamFile
genotyper.out = swapExt(qscript.bamFile, "bam", "unfiltered.vcf")

add(genotyper)

}
}

StdOut:

INFO 23:27:36,528 QScriptManager - Compiling 1 QScript
INFO 23:27:44,663 QScriptManager - Compilation complete
INFO 23:27:44,806 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,806 HelpFormatter - Queue v3.1-1-g07a4bf8, Compiled 2014/03/18 06:09:09
INFO 23:27:44,807 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 23:27:44,807 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 23:27:44,808 HelpFormatter - Program Args: -S hc.scala -R /whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta -I test_bams.list -NCT 2 -SCC 20 -MBQ 20 -H 0.1 -IH 0.001 -O test4.20cpu.2nct.200GB.output.vcf -run
INFO 23:27:44,809 HelpFormatter - Executing as [email protected] on Linux 3.2.0-60-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
INFO 23:27:44,809 HelpFormatter - Date/Time: 2014/06/01 23:27:44
INFO 23:27:44,810 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,810 HelpFormatter - ----------------------------------------------------------------------
INFO 23:27:44,818 QCommandLine - Scripting ExampleUnifiedGenotyper
INFO 23:27:44,969 QCommandLine - Added 1 functions
INFO 23:27:44,971 QGraph - Generating graph.
INFO 23:27:44,994 QGraph - Generating scatter gather jobs.
INFO 23:27:45,058 QGraph - Removing original jobs.
INFO 23:27:45,061 QGraph - Adding scatter gather jobs.
INFO 23:27:45,282 QGraph - Regenerating graph.
INFO 23:27:45,362 QGraph - Running jobs.
INFO 23:27:45,512 FunctionEdge - Starting: 'java' '-Xmx40960m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/whale-data/kk443/mapping/AAAfinished/.queue/tmp' '-cp' '/whale-data/biosoft/src/Queue-3.1-1/Queue.jar' 'org.broadinstitute.sting.gatk.CommandLineGATK' '-T' 'HaplotypeCaller' '-I' '/whale-data/kk443/mapping/AAAfinished/test_bams.list' '-L' '/whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/scatter.intervals' '-R' '/whale-data/kk443/mapping/scaffolds_cinnabar_W.fasta' '-o' '/whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/test_bams.listunfiltered.vcf'
INFO 23:27:45,512 FunctionEdge - Output written to /whale-data/kk443/mapping/AAAfinished/.queue/scatterGather/hc-1-sg/temp_01_of_10/test_bams.listunfiltered.vcf.out

Best Answer

Answers

Sign In or Register to comment.