Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

how to run BaseRecalibrator Queue scripts in parallel

zek12zek12 LondonMember

Hi there

I am trying to optimise GATK BaseRecalibrator using Queue. I have read through GATK Forum, doc etc and implemented my Qscript (MyBaseRecalibrator.scala), which is as follows:

package org.broadinstitute.gatk.queue.qscripts.examples

import org.broadinstitute.gatk.queue.QScript
import org.broadinstitute.gatk.queue.extensions.gatk._

class MyBaseRecalibrator extends QScript {

qscript =>

@Input(doc="The reference file for the bam files.", shortName="R")
var referenceFile: File = _

@Input(doc="Bam file to genotype.", shortName="I")
var bamFile: File = _

@Input(doc="One or more vcfs files.", shortName="knownSites")
var knownSites: List[File] = Nil

@Argument(doc="Number of cpu threads per data thread", shortName="nct", required=false)
var numCPUThreads: Int = _

@Argument(doc="Number of scatters", shortName="nsc", required=false)
var numScatters: Int = _

@Argument(doc="Maxmem.", shortName="mem", required=false)
var maxMem: Int = _

@Output(doc="Recal table", shortName="o")
var outFile: File = _

def script() {
    val BaseRecalibrator = new BaseRecalibrator
    BaseRecalibrator.reference_sequence = referenceFile
    BaseRecalibrator.input_file = List(new File(bamFile))
    BaseRecalibrator.knownSites = knownSites
    BaseRecalibrator.out = new File(outFile)

    BaseRecalibrator.memoryLimit = maxMem
    BaseRecalibrator.scatterCount = numScatters
    BaseRecalibrator.nct = numCPUThreads
    add(BaseRecalibrator)
}

}

Then I tried to run it:

java -jar Queue.jar -S MyBaseRecalibrator.scala \
-R reference_genome.fa -I sample.bam \
-knownSites bundle2.8/2.8/b37/dbsnp_138.b37.vcf \
-knownSites bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf \
-knownSites bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf \
-o sample.recal_table \
-nct 16 -nsc 4 -mem 4 \
-run -startFromScratch

This correctly splits the whole data into 4 chunks, the problem is that jobs are run one at a time, hence no time optimisation. I have seen in other posts that –bsub flag is needed in order to achieve parallelisation of the data chunks being processed. I therefore changed the execution to:

java -jar Queue.jar -S MyBaseRecalibrator.scala \
-R reference_genome.fa -I sample.bam \
-knownSites bundle2.8/2.8/b37/dbsnp_138.b37.vcf \
-knownSites bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf \
-knownSites bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf \
-o sample.recal_table \
-nct 16 -nsc 4 -mem 4 \
-run –bsub -startFromScratch

And the following error comes up:

INFO 15:47:23,344 QScriptManager - Compiling 1 QScript
INFO 15:47:33,308 QScriptManager - Compilation complete
INFO 15:47:33,477 HelpFormatter - ----------------------------------------------------------------------
INFO 15:47:33,477 HelpFormatter - Queue v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:07
INFO 15:47:33,478 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 15:47:33,478 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 15:47:33,478 HelpFormatter - Program Args: -S MyBaseRecalibrator.scala -R reference_genome.fa -I Sample.bam -knownSites bundle2.8/2.8/b37/dbsnp_138.b37.vcf -knownSites bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf -knownSites bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf -o Sample.recal_table -nct 16 -nsc 4 -mem 4 -run -bsub -startFromScratch
INFO 15:47:33,478 HelpFormatter - Executing as [email protected] on Linux 2.6.32-279.19.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17.
INFO 15:47:33,479 HelpFormatter - Date/Time: 2017/01/27 15:47:33
INFO 15:47:33,479 HelpFormatter - ----------------------------------------------------------------------
INFO 15:47:33,480 HelpFormatter - ----------------------------------------------------------------------
INFO 15:47:33,486 QCommandLine - Scripting MyBaseRecalibrator
INFO 15:47:33,672 QCommandLine - Added 1 functions
INFO 15:47:33,682 QGraph - Generating graph.
INFO 15:47:33,724 QGraph - Generating scatter gather jobs.
INFO 15:47:34,435 QGraph - Removing original jobs.
INFO 15:47:34,436 QGraph - Adding scatter gather jobs.
INFO 15:47:34,615 QGraph - Regenerating graph.
INFO 15:47:34,634 QGraph - Running jobs.
INFO 15:47:34,661 QGraph - Removing outputs from previous runs.
INFO 15:47:34,931 FunctionEdge - Starting: ReadScatterFunction: List(bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.idx, bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf, bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf, Sample.bam, bundle2.8/2.8/b37/dbsnp_138.b37.vcf.idx, Sample.bam.bai, bundle2.8/2.8/b37/dbsnp_138.b37.vcf, reference_genome.fa, bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf.idx, Sample.bai) > List(.queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_2_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_3_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_4_of_4/scatter.intervals)
INFO 15:47:34,932 FunctionEdge - Output written to .queue/scatterGather/MyBaseRecalibrator-1-sg/scatter/scatter.out
INFO 15:47:35,275 QGraph - 6 Pend, 1 Run, 0 Fail, 0 Done
INFO 15:48:04,942 FunctionEdge - Done: ReadScatterFunction: List(bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.idx, bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf, bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf, Sample.bam, bundle2.8/2.8/b37/dbsnp_138.b37.vcf.idx, Sample.bam.bai, bundle2.8/2.8/b37/dbsnp_138.b37.vcf, reference_genome.fa, bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf.idx, Sample.bai) > List(.queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_2_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_3_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_4_of_4/scatter.intervals)
INFO 15:48:04,942 QGraph - Writing incremental jobs reports...
INFO 15:48:04,943 QJobsReporter - Writing JobLogging GATKReport to file MyBaseRecalibrator.jobreport.txt
INFO 15:48:05,081 FunctionEdge - Starting: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=.queue/tmp' '-cp' '/apps/gatk3/3.7-0/Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'BaseRecalibrator' '-I' 'Sample.bam' '-L' '.queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/scatter.intervals' '-R' 'reference_genome.fa' '-nct' '16' '-knownSites' 'bundle2.8/2.8/b37/dbsnp_138.b37.vcf' '-knownSites' 'bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf' '-knownSites' 'bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf' '-o' '.queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/Sample.recal_table'
INFO 15:48:05,081 FunctionEdge - Output written to .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/Sample.recal_table.out
#

A fatal error has been detected by the Java Runtime Environment:

#

SIGSEGV (0xb) at pc=0x000000339c732d5f, pid=24019, tid=47381559400192

#

JRE version: Java(TM) SE Runtime Environment (8.0_66-b17) (build 1.8.0_66-b17)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.66-b17 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libc.so.6+0x132d5f]

#

Core dump written. Default location: core or core.24019

#

An error report file with more information is saved as:

hs_err_pid24019.log

#

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

#
Aborted (core dumped)

I am using:
-GATK 3.7
-JRE 1.8.0u66
-LSF 9.1.2

Any help is welcome. Thanks in advance.

Issue · Github
by Sheila

Issue Number
1698
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @zek12, it looks like you're doing the right thing as far as Queue is concerned. I would say the core dump suggests your server is unhappy with something, possibly due to resource limitations, but diagnosing that is beyond the scope of support we can provide. Personally I would check if removing the multithreading makes the problem go away, because multithreading is often a culprit in weird crashes.

Sign In or Register to comment.