We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Second run of BaseRecalibrator in parallel with HaplotypeCaller

SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember
edited March 2016 in Ask the GATK team

Dear GATK team,

I'd like to ask a question about the possibility of HaplotypeCaller and AnalyzeCovariates running in parallel: I developed a QScript that runs indel realignment, BQSR, variant calling (obtaining gVCF file as a result) and then -- BaseRecalibrator for the second time followed by AnalyzeCovariates. Looking in QScript jobreport PDF file I noticed that the second run of BaseRecalibrator was performed after HaplotypeCaller, though as I can understand, HaplotypeCaller and the second run of BaseRecalibrator are independent regarding data and potentially can run in paralle.

Can I run them in parallel somehow in order to save time?

Issue · Github
by Sheila

Issue Number
Last Updated

Best Answer


  • SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember

    Sorry, failed to attach a plot of jobs.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sure, the QC portion of BQSR can be done independently of HaplotypeCaller and the rest of the pipeline.

  • SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember
    edited March 2016

    Geraldine, thank you for your answer! Can I somehow make sure of this parallelization in my Queue script? (It doesn't do this parallelization automatically, as we can see from the job plot.)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Not really; queue will line up jobs based on when their inputs become available, but you can't specifically control the order.

  • SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember

    I went through my QScript again and failed to see any dependencies that could prevent Queue from running HaplotypeCaller and BaseRecalibrator (second time) in parallel. Here is the corresponding snippet of my QScript:

    ` // Generate the second recalibrarion table for BQSR quality control
    val baseRecalibrator2 = new BaseRecalibrator
    baseRecalibrator2.input_file :+= indelRealigner.out
    baseRecalibrator2.intervals :+= qscript.processIntervalsFile
    baseRecalibrator2.reference_sequence = qscript.referenceFile
    baseRecalibrator2.knownSites = knownFilesBQSR
    baseRecalibrator2.BQSR = baseRecalibrator1.out
    baseRecalibrator2.out = qscript.recalTableFile2
    if (qscript.sgBR != 1) {
    baseRecalibrator2.scatterCount = qscript.sgBR
    baseRecalibrator2.nct = qscript.nctBR

         // Perform QC with AnalyzeCovariates
         val analyzeCovariates = new AnalyzeCovariates
         analyzeCovariates.intervals :+= qscript.processIntervalsFile
         analyzeCovariates.reference_sequence = qscript.referenceFile
         analyzeCovariates.beforeReportFile = baseRecalibrator1.out
         analyzeCovariates.afterReportFile = baseRecalibrator2.out
         analyzeCovariates.plotsReportFile = qscript.plotsReportFile
         analyzeCovariates.intermediateCsvFile = qscript.intermediateCsvFile
         // Call variants with HaplotypeCaller and obtain a VCF/gVCF file for the sample
         val haplotypeCaller = new HaplotypeCaller
         haplotypeCaller.input_file :+= printReads.out
         haplotypeCaller.intervals :+= qscript.processIntervalsFile
         haplotypeCaller.reference_sequence = qscript.referenceFile
         if (qscript.pcrModel == "CONSERVATIVE") {
             haplotypeCaller.pcr_indel_model = PCR_ERROR_MODEL.CONSERVATIVE
         } else if (qscript.pcrModel == "NONE") { // for PCR-free data
             haplotypeCaller.pcr_indel_model = PCR_ERROR_MODEL.NONE
         } else if (qscript.pcrModel == "AGGRESSIVE") {
             haplotypeCaller.pcr_indel_model = PCR_ERROR_MODEL.AGGRESSIVE
         } else if (qscript.pcrModel == "HOSTILE") {
             haplotypeCaller.pcr_indel_model = PCR_ERROR_MODEL.HOSTILE
         haplotypeCaller.dbsnp = qscript.dbsnpFile
         if (qscript.emitRefConfidence == "GVCF") {
             haplotypeCaller.emitRefConfidence = ReferenceConfidenceMode.GVCF
             haplotypeCaller.variant_index_type = GATKVCFIndexType.LINEAR
             haplotypeCaller.variant_index_parameter = Some(128000)
         } else if (qscript.emitRefConfidence == "NONE") {
             haplotypeCaller.emitRefConfidence = ReferenceConfidenceMode.NONE
         } else if (qscript.emitRefConfidence == "BP_RESOLUTION") {
             haplotypeCaller.emitRefConfidence = ReferenceConfidenceMode.BP_RESOLUTION
         haplotypeCaller.out = qscript.gvcfFile
         if (qscript.nctHC != 1) {
             haplotypeCaller.nct = qscript.nctHC
         if (qscript.sgHC != 1) {
             haplotypeCaller.scatterCount = qscript.sgHC

    This is the end of the script function. (Sorry for code formatting inconsistency...)

    Maybe Queue can't run independent jobs in parallel because I have no job runner on my server?

  • SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember

    Geraldine, thank you very much for your answer! I'll go through the release notes for 3.5!

  • SvyatoslavSidorovSvyatoslavSidorov St. Petersburg, RussiaMember

    Yes, that was the case! I just added -jobRunner ParallelShell to my Queue command, and all independent tasks began to run in parallel. Thanks again!

Sign In or Register to comment.