We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Unable to determine status of job ID error

djakuboskydjakubosky UCSDMember

I occasionally get the following type of error when using genome strip CNVDisvovery pipeline on a cluster running SGE, it will run successfully for long periods of time and then arrive at this type of error
WARN 18:12:43,778 DrmaaJobRunner - Unable to determine status of job id 128098
1322 org.ggf.drmaa.DrmCommunicationException: failed receiving gdi request response for mid=63604 (can't send response for this message id - protocol error).

Usually I can just restart the pipeline and it will go to completion, but I don't know if I need to restart the whole stage? I have a hard time understanding sometimes whether there are actually problems I need to address. Any tips on troubleshooting this correctly?



  • djakuboskydjakubosky UCSDMember

    I've noticed that stage1 seq_11 has a vcf file with no size (for unknown reasons) how can I re-run this stage do I need to re-run the whole pipeline? can I delete sentinel files to back up?

  • I have the same issue here. Did you happen to solve the problem?

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    You can delete the sentinel files if you want to force the stage to rerun. Rerunning a stage will rerun the Queue pipelines for that stage, which will only redo work that needs to be redone. But there shouldn't be a sentinel file unless the stage completed successfully.

    The original problem was likely a transient failure. My first line of defense is always to retry. If the exact same job fails on retry, then I will dig into the (several layers of) log files to see whether there is some reproducible problem.

  • Actually I have a slightly different error. I will start another thread.

Sign In or Register to comment.