how to run BaseRecalibrator Queue scripts in parallel

Hi there

I am trying to optimise GATK BaseRecalibrator using Queue. I have read through GATK Forum, doc etc and implemented my Qscript (MyBaseRecalibrator.scala), which is as follows:

package org.broadinstitute.gatk.queue.qscripts.examples

import org.broadinstitute.gatk.queue.QScript
import org.broadinstitute.gatk.queue.extensions.gatk._

class MyBaseRecalibrator extends QScript {

qscript =>

@Input(doc="The reference file for the bam files.", shortName="R")
var referenceFile: File = _

@Input(doc="Bam file to genotype.", shortName="I")
var bamFile: File = _

@Input(doc="One or more vcfs files.", shortName="knownSites")
var knownSites: List[File] = Nil

@Argument(doc="Number of cpu threads per data thread", shortName="nct", required=false)
var numCPUThreads: Int = _

@Argument(doc="Number of scatters", shortName="nsc", required=false)
var numScatters: Int = _

@Argument(doc="Maxmem.", shortName="mem", required=false)
var maxMem: Int = _

@Output(doc="Recal table", shortName="o")
var outFile: File = _

def script() {
    val BaseRecalibrator = new BaseRecalibrator
    BaseRecalibrator.reference_sequence = referenceFile
    BaseRecalibrator.input_file = List(new File(bamFile))
    BaseRecalibrator.knownSites = knownSites
    BaseRecalibrator.out = new File(outFile)

    BaseRecalibrator.memoryLimit = maxMem
    BaseRecalibrator.scatterCount = numScatters
    BaseRecalibrator.nct = numCPUThreads
    add(BaseRecalibrator)
}

}

Then I tried to run it:

java -jar Queue.jar -S MyBaseRecalibrator.scala \
-R reference_genome.fa -I sample.bam \
-knownSites bundle2.8/2.8/b37/dbsnp_138.b37.vcf \
-knownSites bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf \
-knownSites bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf \
-o sample.recal_table \
-nct 16 -nsc 4 -mem 4 \
-run -startFromScratch

This correctly splits the whole data into 4 chunks, the problem is that jobs are run one at a time, hence no time optimisation. I have seen in other posts that –bsub flag is needed in order to achieve parallelisation of the data chunks being processed. I therefore changed the execution to:

java -jar Queue.jar -S MyBaseRecalibrator.scala \
-R reference_genome.fa -I sample.bam \
-knownSites bundle2.8/2.8/b37/dbsnp_138.b37.vcf \
-knownSites bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf \
-knownSites bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf \
-o sample.recal_table \
-nct 16 -nsc 4 -mem 4 \
-run –bsub -startFromScratch

And the following error comes up:

INFO 15:47:23,344 QScriptManager - Compiling 1 QScript
INFO 15:47:33,308 QScriptManager - Compilation complete
INFO 15:47:33,477 HelpFormatter - ----------------------------------------------------------------------
INFO 15:47:33,477 HelpFormatter - Queue v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:07
INFO 15:47:33,478 HelpFormatter - Copyright (c) 2012 The Broad Institute
INFO 15:47:33,478 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
INFO 15:47:33,478 HelpFormatter - Program Args: -S MyBaseRecalibrator.scala -R reference_genome.fa -I Sample.bam -knownSites bundle2.8/2.8/b37/dbsnp_138.b37.vcf -knownSites bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf -knownSites bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf -o Sample.recal_table -nct 16 -nsc 4 -mem 4 -run -bsub -startFromScratch
INFO 15:47:33,478 HelpFormatter - Executing as [email protected] on Linux 2.6.32-279.19.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17.
INFO 15:47:33,479 HelpFormatter - Date/Time: 2017/01/27 15:47:33
INFO 15:47:33,479 HelpFormatter - ----------------------------------------------------------------------
INFO 15:47:33,480 HelpFormatter - ----------------------------------------------------------------------
INFO 15:47:33,486 QCommandLine - Scripting MyBaseRecalibrator
INFO 15:47:33,672 QCommandLine - Added 1 functions
INFO 15:47:33,682 QGraph - Generating graph.
INFO 15:47:33,724 QGraph - Generating scatter gather jobs.
INFO 15:47:34,435 QGraph - Removing original jobs.
INFO 15:47:34,436 QGraph - Adding scatter gather jobs.
INFO 15:47:34,615 QGraph - Regenerating graph.
INFO 15:47:34,634 QGraph - Running jobs.
INFO 15:47:34,661 QGraph - Removing outputs from previous runs.
INFO 15:47:34,931 FunctionEdge - Starting: ReadScatterFunction: List(bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.idx, bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf, bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf, Sample.bam, bundle2.8/2.8/b37/dbsnp_138.b37.vcf.idx, Sample.bam.bai, bundle2.8/2.8/b37/dbsnp_138.b37.vcf, reference_genome.fa, bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf.idx, Sample.bai) > List(.queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_2_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_3_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_4_of_4/scatter.intervals)
INFO 15:47:34,932 FunctionEdge - Output written to .queue/scatterGather/MyBaseRecalibrator-1-sg/scatter/scatter.out
INFO 15:47:35,275 QGraph - 6 Pend, 1 Run, 0 Fail, 0 Done
INFO 15:48:04,942 FunctionEdge - Done: ReadScatterFunction: List(bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.idx, bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf, bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf, Sample.bam, bundle2.8/2.8/b37/dbsnp_138.b37.vcf.idx, Sample.bam.bai, bundle2.8/2.8/b37/dbsnp_138.b37.vcf, reference_genome.fa, bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf.idx, Sample.bai) > List(.queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_2_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_3_of_4/scatter.intervals, .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_4_of_4/scatter.intervals)
INFO 15:48:04,942 QGraph - Writing incremental jobs reports...
INFO 15:48:04,943 QJobsReporter - Writing JobLogging GATKReport to file MyBaseRecalibrator.jobreport.txt
INFO 15:48:05,081 FunctionEdge - Starting: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=.queue/tmp' '-cp' '/apps/gatk3/3.7-0/Queue.jar' 'org.broadinstitute.gatk.engine.CommandLineGATK' '-T' 'BaseRecalibrator' '-I' 'Sample.bam' '-L' '.queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/scatter.intervals' '-R' 'reference_genome.fa' '-nct' '16' '-knownSites' 'bundle2.8/2.8/b37/dbsnp_138.b37.vcf' '-knownSites' 'bundle2.8/2.8/b37/Mills_and_1000G_gold_standard.indels.b37.vcf' '-knownSites' 'bundle2.8/2.8/b37/1000G_phase1.indels.b37.vcf' '-o' '.queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/Sample.recal_table'
INFO 15:48:05,081 FunctionEdge - Output written to .queue/scatterGather/MyBaseRecalibrator-1-sg/temp_1_of_4/Sample.recal_table.out
#

A fatal error has been detected by the Java Runtime Environment:

#

SIGSEGV (0xb) at pc=0x000000339c732d5f, pid=24019, tid=47381559400192

#

JRE version: Java(TM) SE Runtime Environment (8.0_66-b17) (build 1.8.0_66-b17)

Java VM: Java HotSpot(TM) 64-Bit Server VM (25.66-b17 mixed mode linux-amd64 compressed oops)

Problematic frame:

C [libc.so.6+0x132d5f]

#

Core dump written. Default location: core or core.24019

#

An error report file with more information is saved as:

hs_err_pid24019.log

#

If you would like to submit a bug report, please visit:

http://bugreport.java.com/bugreport/crash.jsp

The crash happened outside the Java Virtual Machine in native code.

See problematic frame for where to report the bug.

#
Aborted (core dumped)

I am using:
-GATK 3.7
-JRE 1.8.0u66
-LSF 9.1.2

Any help is welcome. Thanks in advance.

Issue · Github
by Sheila

Issue Number
1698
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @zek12, it looks like you're doing the right thing as far as Queue is concerned. I would say the core dump suggests your server is unhappy with something, possibly due to resource limitations, but diagnosing that is beyond the scope of support we can provide. Personally I would check if removing the multithreading makes the problem go away, because multithreading is often a culprit in weird crashes.

Sign In or Register to comment.