Forum Login Issue:
Currently the "Log in with Google" button redirects you to a "Page not found." This is an issue that our forum vendors are working on fixing. In the meantime, while on the "Page not found" you can edit the URL to delete the second gatk, firecloud, or wdl (depending on what subforum you are acessing).

"Unable to determine status of job" warn in Genome STRiP preprocessing

So when in the preprocessing step, there always occurs such kind of warning:

WARN  10:50:19,392 DrmaaJobRunner - Unable to determine status of job id 2231784
org.ggf.drmaa.DrmCommunicationException: unable to send message to qmaster using port 6448 on host "sge2": got send error
        at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(
        at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(
        at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.getJobProgramStatus(
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.liftedTree1$1(DrmaaJobRunner.scala:124)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.updateJobStatus(DrmaaJobRunner.scala:123)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56)
        at scala.collection.immutable.Set$Set3.foreach(Set.scala:115)
        at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager.updateStatus(DrmaaJobManager.scala:56)
        at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1369)
        at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1361)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.broadinstitute.gatk.queue.engine.QGraph.updateStatus(QGraph.scala:1361)
        at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:548)
        at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:170)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(
        at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:61)
        at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)

This will certainly cause a job to fail:

INFO  18:03:26,604 QCommandLine - Script failed: 1 Pend, 0 Run, 1 Fail, 980 Done

If this occurs at the very end of the master job (as is shown above, 1 Pend, 0 Run, 1 Fail, 980 Done), then I think intermediate files is already deleted, which causes redoing all jobs all over, if I resubmit the master job.

The " Unable to determine status of job" warning occurs randomly (not specific to a particular job), as resubmitting the master job can fix it. But if it occurs at the very end, then resubmitting the master job will redo everything.

So is there a way to prevent this warning? If not, I guess the program really needs to be improved to prevent such kind of "redo all over".


  • bhandsakerbhandsaker Member, Broadie, Moderator

    Have you verified that rerunning the job will redo the work that was already done? For example, you can rerun without "-run" and this will do a "dry run" and tell you what jobs will be redone. Queue is generally pretty reliable about not redoing work unless somehow the .*.done files get lost.

  • hjzhouhjzhou Member

    Yes. From job 6 to job 982, all need to be redone.
    And here is all files in the metadata folder:

  • skashinskashin Member

    Indeed, once the intermediate files have been deleted, rerunning the preprocessing in this directory will result in most of the jobs being rerun.
    I will make a code change to ensure that deleting intermediate files takes place at the very end of the workflow.

    For your run, I believe that only the last step that creates the file metadata/profiles_100Kb/rd.dat still needs to be run.
    The easiest way to do it would be to run the preprocessing script in a dry-run mode that outputs all the commands, and then get the java command for the last step and run it directly from a console.

  • bhandsakerbhandsaker Member, Broadie, Moderator
    edited March 23

    It is also possible that the rd.dat job succeeded. This output file (and the .done file) should be in the profiles_100Kb directory. If these files are there, you can just ignore the error. The rd.dat file is, in fact, not used in downstream processing, so you can also just ignore this error if that file is not there for some reason.

  • hjzhouhjzhou Member

    Thank you very much. I checked the profiles_100kb folder. There is no rd.dat file. Also, there is one tbi file missing for one chromosome. Since it was M/Y chromosome, which does not seem essential, I continued the downstream steps and it seemed working OK. But I guess the script still can be improved to ensure that intermediate files won't be deleted until very last.

Sign In or Register to comment.