Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Unable to determine status of job ID error

djakuboskydjakubosky UCSDMember

Hi,
I occasionally get the following type of error when using genome strip CNVDisvovery pipeline on a cluster running SGE, it will run successfully for long periods of time and then arrive at this type of error
WARN 18:12:43,778 DrmaaJobRunner - Unable to determine status of job id 128098
1322 org.ggf.drmaa.DrmCommunicationException: failed receiving gdi request response for mid=63604 (can't send response for this message id - protocol error).

Usually I can just restart the pipeline and it will go to completion, but I don't know if I need to restart the whole stage? I have a hard time understanding sometimes whether there are actually problems I need to address. Any tips on troubleshooting this correctly?

Thanks!

Answers

  • djakuboskydjakubosky UCSDMember

    I've noticed that stage1 seq_11 has a vcf file with no size (for unknown reasons) how can I re-run this stage do I need to re-run the whole pipeline? can I delete sentinel files to back up?

  • I have the same issue here. Did you happen to solve the problem?

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    You can delete the sentinel files if you want to force the stage to rerun. Rerunning a stage will rerun the Queue pipelines for that stage, which will only redo work that needs to be redone. But there shouldn't be a sentinel file unless the stage completed successfully.

    The original problem was likely a transient failure. My first line of defense is always to retry. If the exact same job fails on retry, then I will dig into the (several layers of) log files to see whether there is some reproducible problem.

  • Actually I have a slightly different error. I will start another thread.

Sign In or Register to comment.