To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Unable to determine status of job ID error

Hi,
I occasionally get the following type of error when using genome strip CNVDisvovery pipeline on a cluster running SGE, it will run successfully for long periods of time and then arrive at this type of error
WARN 18:12:43,778 DrmaaJobRunner - Unable to determine status of job id 128098
1322 org.ggf.drmaa.DrmCommunicationException: failed receiving gdi request response for mid=63604 (can't send response for this message id - protocol error).

Usually I can just restart the pipeline and it will go to completion, but I don't know if I need to restart the whole stage? I have a hard time understanding sometimes whether there are actually problems I need to address. Any tips on troubleshooting this correctly?

Thanks!

Answers

  • I've noticed that stage1 seq_11 has a vcf file with no size (for unknown reasons) how can I re-run this stage do I need to re-run the whole pipeline? can I delete sentinel files to back up?

  • I have the same issue here. Did you happen to solve the problem?

  • bhandsakerbhandsaker Member, Broadie, Moderator

    You can delete the sentinel files if you want to force the stage to rerun. Rerunning a stage will rerun the Queue pipelines for that stage, which will only redo work that needs to be redone. But there shouldn't be a sentinel file unless the stage completed successfully.

    The original problem was likely a transient failure. My first line of defense is always to retry. If the exact same job fails on retry, then I will dig into the (several layers of) log files to see whether there is some reproducible problem.

Sign In or Register to comment.