Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

IndelRealigner nWayOut + Queue

SanderBSanderB NetherlandsMember

I've recently started using the nWayOut during indel realignment. Using this option without Queue works just fine and produces a separate BAM file for every input file. However when I use the nWayOut option in a Queue scala script it produces a *_realigned.bam file per chunk per sample and never merges these into a single per-sample BAM-file. Am I missing something really obvious here?
I've copied part of the scala script I use below. I've used version 2.7.2 of the GATK toolkit + Queue.

val realigner = new IndelRealigner with UnifiedGenotyperArguments

realigner.scatterCount = 136

for (bamFile <- bamFiles) {
    realigner.input_file :+= bamFile
}

realigner.targetIntervals = qscript.targetIntervals
realigner.nWayOut = "_realigned.bam"

add(realigner)

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It's hard to say without seeing more of the script, but have a look at this post for a full example: http://gatkforums.broadinstitute.org/discussion/3441/queue-how-to-connect-gatk-walkers

  • pdexheimerpdexheimer Member ✭✭✭✭

    I don't think this is possible in Queue, at least the way things are currently written. "nWayOut" is currently marked as an Argument, not as an Output. Only Outputs can be gathered - but simply changing the tag wouldn't work because there is no file called "_realigned.bam".

    I feel like this should be something you can do with another Scala function that identifies all of the scattered pieces and then invokes the BamGatherFunction on each. But I haven't really thought through everything, so it's possible I'm missing some important piece

  • SanderBSanderB NetherlandsMember

    @Geraldine_VdAuwera said:
    It's hard to say without seeing more of the script, but have a look at this post for a full example: http://gatkforums.broadinstitute.org/discussion/3441/queue-how-to-connect-gatk-walkers

    Thanks for the quick reply. So on two different locations we tried an exact copy of the "multi-sample data processing" part of the script you linked to. The only addition we made was add a scatterCount of 5000. The output will consist of 5000xnrsamples little (cleaned.bam) bam-chunks.

    @pdexheimer said:
    I don't think this is possible in Queue, at least the way things are currently written. "nWayOut" is currently marked as an Argument, not as an Output. Only Outputs can be gathered - but simply changing the tag wouldn't work because there is no file called "_realigned.bam".

    I feel like this should be something you can do with another Scala function that identifies all of the scattered pieces and then invokes the BamGatherFunction on each. But I haven't really thought through everything, so it's possible I'm missing some important piece

    How would one go about merging these little BAM chunks using the BamGatherFunction , is the location of these BAM chunks stored somewhere ?

  • SanderBSanderB NetherlandsMember

    @Geraldine_VdAuwera said:
    OK, I've checked with Khalid (author of Queue); it is indeed currently impossible to use nWayOut with scatter-gather. You can use it in your script but you'd have to either disable scatter gather for that function, or implement pdexheimer's solution.

    It would be a lovely feature to have in the future. For now I'll script around it. Thanks for the quick replies.

    @pdexheimer said:
    Assuming that your workflow requires the use of nWayOut, the only solution I can see is to stop the QScript after IndelRealigner, run a separate process after it completes that reassembles the bams (BamGatherFunction is basically just a wrapper around Picard's MergeSamFiles), and then run another QScript to pick up your workflow from that point. Not a pretty solution by any means

    Thanks for the idea. I indeed came with a similar solution. Not using Picard but sambamba https://github.com/lomereiter/sambamba (does the same thing only faster).

Sign In or Register to comment.