The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Creating a ParallelShellJobRunner for Queue

Hi all!
I'm working on trying to get a parallel version of the ShellJobRunner
working in Queue, which would allow us to parallelize some parts of our workflows that are running single core on a full node using the ShellJobRunner and thus are wasting a lot of resources. I thought that I'd made some rather nice progress, until I noticed that if I tried to use it for any job running longer than about 5 minutes the job runner would exit saying that it's job failed, while in reality the job keeps running (so it obviously it did not fail, and Queue doesn't kill it either).
The code I've come up with so far is available here: https://gist.github.com/johandahlberg/a9b7ac61c3aa2c654899 (And as you can see it's mostly stolen from the regular ShellJobRunner, which with some Scala future stuff mixed in)
I'm guessing that the problems comes from me abusing the ProcessController (and admittedly there are warnings in the source for it for not being thread safe), but I'm not sure if there is any way that I can get around it. Any pointers here would be extremely appreciated - also if there is any general interest in this feature I'd be happy to clean up the code a bit and submit a pull request on this upstream.
/Johan
Best Answer
-
Johan_Dahlberg ✭✭✭
I did actually figure this out. I'd missed updating the job status while when the real status didn't change (e.g. still running since last time), this lead to the
checkUnknownStatus
function inCommandLineJobRunner
to fail after five minutes (since this is the hard coded value after which it will say it couldn't find a status.If we can get this reasonable stable I'll submit a pull request with the new job runner.
Answers
Hi Johan,
Sorry for the late reply. We certainly can't offer any advice on what you're trying to do. We've had mixed results playing with levels of parallelism and are not putting resources toward that at the moment. If you have the opportunity to develop this into a feature that could be useful to others, we would as usual be delighted to look at a pull request, of course
I did actually figure this out. I'd missed updating the job status while when the real status didn't change (e.g. still running since last time), this lead to the
checkUnknownStatus
function inCommandLineJobRunner
to fail after five minutes (since this is the hard coded value after which it will say it couldn't find a status.If we can get this reasonable stable I'll submit a pull request with the new job runner.
Also I guess that you can mark my answer above as the answer. Have you btw concidered making it possible to mark your own posts as answers in the style of stackoverflow? I often seem to find myself answering my own questions, and then I think it's nice if you can write up the conclusions and leave it in the thread for future reference.
I had not realized it wasn't possible, actually -- I'll check with Vanilla what it would take to enable that.
@Johan_Dahlberg Unfortunately it seems that isn't currently possible, sorry. I put in a feature request but don't anyone hold your breath.