IndelRealigner nWayOut + Queue

SanderBSanderB NetherlandsMember

I've recently started using the nWayOut during indel realignment. Using this option without Queue works just fine and produces a separate BAM file for every input file. However when I use the nWayOut option in a Queue scala script it produces a *_realigned.bam file per chunk per sample and never merges these into a single per-sample BAM-file. Am I missing something really obvious here?
I've copied part of the scala script I use below. I've used version 2.7.2 of the GATK toolkit + Queue.

val realigner = new IndelRealigner with UnifiedGenotyperArguments

realigner.scatterCount = 136

for (bamFile <- bamFiles) {
    realigner.input_file :+= bamFile
}

realigner.targetIntervals = qscript.targetIntervals
realigner.nWayOut = "_realigned.bam"

add(realigner)

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    It's hard to say without seeing more of the script, but have a look at this post for a full example: http://gatkforums.broadinstitute.org/discussion/3441/queue-how-to-connect-gatk-walkers

  • pdexheimerpdexheimer Member ✭✭✭✭

    I don't think this is possible in Queue, at least the way things are currently written. "nWayOut" is currently marked as an Argument, not as an Output. Only Outputs can be gathered - but simply changing the tag wouldn't work because there is no file called "_realigned.bam".

    I feel like this should be something you can do with another Scala function that identifies all of the scattered pieces and then invokes the BamGatherFunction on each. But I haven't really thought through everything, so it's possible I'm missing some important piece

  • SanderBSanderB NetherlandsMember

    @Geraldine_VdAuwera said:
    It's hard to say without seeing more of the script, but have a look at this post for a full example: http://gatkforums.broadinstitute.org/discussion/3441/queue-how-to-connect-gatk-walkers

    Thanks for the quick reply. So on two different locations we tried an exact copy of the "multi-sample data processing" part of the script you linked to. The only addition we made was add a scatterCount of 5000. The output will consist of 5000xnrsamples little (cleaned.bam) bam-chunks.

    @pdexheimer said:
    I don't think this is possible in Queue, at least the way things are currently written. "nWayOut" is currently marked as an Argument, not as an Output. Only Outputs can be gathered - but simply changing the tag wouldn't work because there is no file called "_realigned.bam".

    I feel like this should be something you can do with another Scala function that identifies all of the scattered pieces and then invokes the BamGatherFunction on each. But I haven't really thought through everything, so it's possible I'm missing some important piece

    How would one go about merging these little BAM chunks using the BamGatherFunction , is the location of these BAM chunks stored somewhere ?

  • SanderBSanderB NetherlandsMember

    @Geraldine_VdAuwera said:
    OK, I've checked with Khalid (author of Queue); it is indeed currently impossible to use nWayOut with scatter-gather. You can use it in your script but you'd have to either disable scatter gather for that function, or implement pdexheimer's solution.

    It would be a lovely feature to have in the future. For now I'll script around it. Thanks for the quick replies.

    @pdexheimer said:
    Assuming that your workflow requires the use of nWayOut, the only solution I can see is to stop the QScript after IndelRealigner, run a separate process after it completes that reassembles the bams (BamGatherFunction is basically just a wrapper around Picard's MergeSamFiles), and then run another QScript to pick up your workflow from that point. Not a pretty solution by any means

    Thanks for the idea. I indeed came with a similar solution. Not using Picard but sambamba https://github.com/lomereiter/sambamba (does the same thing only faster).

Sign In or Register to comment.