Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Creating a ParallelShellJobRunner for Queue

Johan_DahlbergJohan_Dahlberg Member ✭✭✭
edited February 2015 in Ask the GATK team

Hi all!

I'm working on trying to get a parallel version of the ShellJobRunner working in Queue, which would allow us to parallelize some parts of our workflows that are running single core on a full node using the ShellJobRunner and thus are wasting a lot of resources. I thought that I'd made some rather nice progress, until I noticed that if I tried to use it for any job running longer than about 5 minutes the job runner would exit saying that it's job failed, while in reality the job keeps running (so it obviously it did not fail, and Queue doesn't kill it either).

The code I've come up with so far is available here: https://gist.github.com/johandahlberg/a9b7ac61c3aa2c654899 (And as you can see it's mostly stolen from the regular ShellJobRunner, which with some Scala future stuff mixed in)

I'm guessing that the problems comes from me abusing the ProcessController (and admittedly there are warnings in the source for it for not being thread safe), but I'm not sure if there is any way that I can get around it. Any pointers here would be extremely appreciated - also if there is any general interest in this feature I'd be happy to clean up the code a bit and submit a pull request on this upstream.



Best Answer


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Johan,

    Sorry for the late reply. We certainly can't offer any advice on what you're trying to do. We've had mixed results playing with levels of parallelism and are not putting resources toward that at the moment. If you have the opportunity to develop this into a feature that could be useful to others, we would as usual be delighted to look at a pull request, of course :)

  • Johan_DahlbergJohan_Dahlberg Member ✭✭✭

    Also I guess that you can mark my answer above as the answer. Have you btw concidered making it possible to mark your own posts as answers in the style of stackoverflow? I often seem to find myself answering my own questions, and then I think it's nice if you can write up the conclusions and leave it in the thread for future reference.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I had not realized it wasn't possible, actually -- I'll check with Vanilla what it would take to enable that.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Johan_Dahlberg Unfortunately it seems that isn't currently possible, sorry. I put in a feature request but don't anyone hold your breath.

Sign In or Register to comment.