GATK licensing moves to direct-through-Broad model -- read about it on the GATK blog

Queue and JobReport.txt

Hi,

I've been using Queue for pipelineing now for a little while, and have run in to an issue with the job report. The issue is that it's blank, after the pipeline has finished, the entire contents of the file is

#:GATKReport.v1.1:0

I'm getting output from Queue saying that

 INFO  21:37:35,646 QJobsReporter - Writing JobLogging GATKReport to file /home/daniel.klevebring/bin/autoseqer/AutoSeqPipeline.jobreport.txt 

Any ideas why? The report would be a great tool, so I'm really interested in getting it to work properly.

Happy holidays,

Daniel

Best Answer

Answers

  • pdexheimerpdexheimer Posts: 448Member, GSA Collaborator ✭✭✭✭

    Consistently? That file is truncated when it's open, so it'll certainly get blanked out if you start another run and then kill it before any jobs complete. But that should be pretty rare...

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,532Administrator, GATK Developer admin

    That's a new one, no idea what the problem could be, barring the admittedly rare possibility that @pdexheimer raises. If this is a recurring/consistent issue I'll kick it over to Khalid, but it'll probably have to wait until after the holiday break.

    Geraldine Van der Auwera, PhD

  • dklevebringdklevebring Posts: 76Member

    Hi,

    So this seems to be something fishy with my pipeline. When I try a minimalist version with just a few steps, it writes fine, at least when I'm running locally. I have a few more things to try, will post updates here:

    1. Minimalist pipe on slurm cluster (via Drmaa)
    2. Long pipeline locally (not sure this is a great idea though - will be cpu & mem intensive.

    A quick question before I'll continue searching for bugs: Is the jobReport supposed to work on SLRUM-Drmma clusters? Has that been tested, or could this be a bug specific to that kind of environment?

    happy holidays,

    Daniel

  • dklevebringdklevebring Posts: 76Member

    Hi,

    I have now tested the minimalist version of the pipeline with test data on a SLURM cluster using Drmaa. It produces an empty jobreport, with the single line as above.

    Any ideas on why?

    thanks

    Daniel

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,532Administrator, GATK Developer admin

    Hi Daniel,

    We do not do any testing on SLURM-Drmma clusters (because we don't have one) so it is fairly likely that the issue is platform-specific, and unfortunately we don't currently have the resources to look into this. I'm afraid we can't help you with this at this time, sorry.

    Geraldine Van der Auwera, PhD

  • dklevebringdklevebring Posts: 76Member
    edited January 2014

    @Johan_Dahlberg, have you seen this behavior? I get empty jobReports with both GATK 2.7 and 2.8-1 on SLURM-Drmaa.

    Post edited by dklevebring on
  • Johan_DahlbergJohan_Dahlberg Posts: 89Member ✭✭✭

    @dklevebring: Yes. I see them same behavior. However, I've not put any effort into debugging it this far. Would be interesting to here if you come up with any ideas on how to solve this.

  • pdexheimerpdexheimer Posts: 448Member, GSA Collaborator ✭✭✭✭

    Hmm, I think I see where the problem has to be, but it's a big brick wall from where I sit with no knowledge of DRMAA or way to test.

    • Only jobs that have filled in their JobRunInfo are printed to the job report (queue.util.QJobsReporter.scala:64)
    • isFilledIn() is true if both the start and end times are set (queue.engine.JobRunInfo.scala:61)
    • For LSF, those values are set for jobs marked DONE or FAILED
    • For Drmaa, they're only set for jobs marked DONE, and they're only set if getResourceUsage is non-null on the appropriate org.ggf.drmaa.JobInfo object (queue.engine.drmaa.DrmaaJobRunner.scala:116). The JobInfo is acquired a couple of lines earlier from a call to org.ggf.drmaa.Session.wait().

    My prediction is that either that wait() call isn't working appropriately, or the resulting JobInfo doesn't have the usage information filled in. But that's as far as I can get….

  • dklevebringdklevebring Posts: 76Member

    Thanks! I'm gonna try to add some debug lines in queue.engine.drmaa.DrmaaJobRunner.scala to try to understand if this is solvable. I'll get back you you as soon as I have anything.

  • dklevebringdklevebring Posts: 76Member

    Hi!

    So I added some debug lines to DrmaaJobRunner.scala and learned the following:

    • The wait() call seems to work. I get a JobInfo object back.
    • That object has the following info in its getResourceUsage.toString()

      • {submission_time=1390752381, cpu=0, mem=0, vmem=0, walltime=0, hosts=(null)}

    Interestingly, it only has submission_time, and is completely missing both start_time and end_time. The remaining keys are 0.

    I also checked session.getDrmaaImplementation() which for me translates to PSNC DRMAA for SLURM 1.0.7.

  • dklevebringdklevebring Posts: 76Member

    …And that's about as far as I can get. As far as I can see, there could be several reasons:

    • misconfiguration of slurm-drmaa on the cluster(s)
    • bug in slurm-drmaa
    • bug in drmaa

    but I'm probably not the right person to dig further. Sorry about that.

  • Johan_DahlbergJohan_Dahlberg Posts: 89Member ✭✭✭

    I've been trying to use my extremely limited knowledge of C (ead non-existent) to look into the DRMAA SLURM implementation. Here is something that I found interesting (http://apps.man.poznan.pl/trac/slurm-drmaa/browser/trunk/slurm_drmaa/job.c):

    case JOB_COMPLETE:
        fsd_log_debug(("interpreting as DRMAA_PS_DONE"));
        self->state = DRMAA_PS_DONE;
        self->exit_status = job_info->job_array[0].exit_code;
        fsd_log_debug(("exit_status = %d -> %d",self->exit_status, WEXITSTATUS(self->exit_status)));
        break;
    

    In the above it seems to me that the only thing being passed back once the job is complete is the exit status. So probably this is not a bug, but something which is not implemented in the library. I'll send an email to the maintainer of the DRMAA SLURM implementation as soon as I have the time, since he's been very helpful in my previous correspondence with him I'm sure he'll be able to shed some light on this situation.

  • dklevebringdklevebring Posts: 76Member

    Thanks @Johan_Dahlberg, looking forward to hearing how it unravels.

  • santinosantino Posts: 20Member

    @Johan_Dahlberg, @dklevebring, @Geraldine_VdAuwera, I have the same problem. Any further information? Thanks.

  • santinosantino Posts: 20Member

    @Johan_Dahlberg said:
    I'm still waiting to hear back from the maintainer of the slurm-drmaa libraries. So no new information at the moment I'm afraid.

    Is it possible to generate report separately for each job and merge them after running?

  • dklevebringdklevebring Posts: 76Member

    Unfortuantely no, this bug/lack of implementation causes no info to be written even for single jobs. You're out of luck until that's fixed.

  • dklevebringdklevebring Posts: 76Member

    @Johan_Dahlberg‌ Still nothing from the slurm-drmaa maintainer?

  • Johan_DahlbergJohan_Dahlberg Posts: 89Member ✭✭✭

    Yes. Sad to say, I've had no respons. What we need is someone with sufficient knowledge of C to fix this, do you have know someone who might be interested in helping @dklevebring‌?

  • flescaiflescai Posts: 61Member ✭✭

    I have the same issue here, but I'm using pbs-drmaa-1.0.14

    In previous GATK versions, however, I had at least a print of the type of jobs under the analysis names BQSRGatherer, Concat, LocusScatterFunction, ReadScatterFunction, SimpleTextGatherFunction, i.e. those running locally.

    Now, I simply get a blank report.

  • dklevebringdklevebring Posts: 76Member

    @Johan_Dahlberg‌, I've been giving this some thought but have come up empty unfortunately. No word from the package maintainer I suppose? U reckon any of the admins at uppmax would be interested?

  • Johan_DahlbergJohan_Dahlberg Posts: 89Member ✭✭✭

    Still no word from the maintainer, and I haven't really had the time to look into this. However, checking with the guys at Uppmax sounds like a good idea. Could you send them an e-mail, @dklevebring‌?

Sign In or Register to comment.