The current GATK version is 3.2-2

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

# Queue and JobReport.txt

Posts: 57Member

Hi,

I've been using Queue for pipelineing now for a little while, and have run in to an issue with the job report. The issue is that it's blank, after the pipeline has finished, the entire contents of the file is

#:GATKReport.v1.1:0


I'm getting output from Queue saying that

 INFO  21:37:35,646 QJobsReporter - Writing JobLogging GATKReport to file /home/daniel.klevebring/bin/autoseqer/AutoSeqPipeline.jobreport.txt


Any ideas why? The report would be a great tool, so I'm really interested in getting it to work properly.

Happy holidays,

Daniel

Tagged:

• Posts: 324Member, GSA Collaborator ✭✭✭

Consistently? That file is truncated when it's open, so it'll certainly get blanked out if you start another run and then kill it before any jobs complete. But that should be pretty rare...

That's a new one, no idea what the problem could be, barring the admittedly rare possibility that @pdexheimer raises. If this is a recurring/consistent issue I'll kick it over to Khalid, but it'll probably have to wait until after the holiday break.

Geraldine Van der Auwera, PhD

• Posts: 57Member

Hi,

So this seems to be something fishy with my pipeline. When I try a minimalist version with just a few steps, it writes fine, at least when I'm running locally. I have a few more things to try, will post updates here:

1. Minimalist pipe on slurm cluster (via Drmaa)
2. Long pipeline locally (not sure this is a great idea though - will be cpu & mem intensive.

A quick question before I'll continue searching for bugs: Is the jobReport supposed to work on SLRUM-Drmma clusters? Has that been tested, or could this be a bug specific to that kind of environment?

happy holidays,

Daniel

• Posts: 57Member

Hi,

I have now tested the minimalist version of the pipeline with test data on a SLURM cluster using Drmaa. It produces an empty jobreport, with the single line as above.

Any ideas on why?

thanks

Daniel

Hi Daniel,

We do not do any testing on SLURM-Drmma clusters (because we don't have one) so it is fairly likely that the issue is platform-specific, and unfortunately we don't currently have the resources to look into this. I'm afraid we can't help you with this at this time, sorry.

Geraldine Van der Auwera, PhD

• Posts: 57Member
edited January 2

@Johan_Dahlberg, have you seen this behavior? I get empty jobReports with both GATK 2.7 and 2.8-1 on SLURM-Drmaa.

Post edited by dklevebring on
• Posts: 85Member ✭✭✭

@dklevebring: Yes. I see them same behavior. However, I've not put any effort into debugging it this far. Would be interesting to here if you come up with any ideas on how to solve this.

• Posts: 324Member, GSA Collaborator ✭✭✭

Hmm, I think I see where the problem has to be, but it's a big brick wall from where I sit with no knowledge of DRMAA or way to test.

• Only jobs that have filled in their JobRunInfo are printed to the job report (queue.util.QJobsReporter.scala:64)
• isFilledIn() is true if both the start and end times are set (queue.engine.JobRunInfo.scala:61)
• For LSF, those values are set for jobs marked DONE or FAILED
• For Drmaa, they're only set for jobs marked DONE, and they're only set if getResourceUsage is non-null on the appropriate org.ggf.drmaa.JobInfo object (queue.engine.drmaa.DrmaaJobRunner.scala:116). The JobInfo is acquired a couple of lines earlier from a call to org.ggf.drmaa.Session.wait().

My prediction is that either that wait() call isn't working appropriately, or the resulting JobInfo doesn't have the usage information filled in. But that's as far as I can get….

• Posts: 57Member

Thanks! I'm gonna try to add some debug lines in queue.engine.drmaa.DrmaaJobRunner.scala to try to understand if this is solvable. I'll get back you you as soon as I have anything.

• Posts: 57Member

Hi!

So I added some debug lines to DrmaaJobRunner.scala and learned the following:

• The wait() call seems to work. I get a JobInfo object back.
• That object has the following info in its getResourceUsage.toString()
• {submission_time=1390752381, cpu=0, mem=0, vmem=0, walltime=0, hosts=(null)}

Interestingly, it only has submission_time, and is completely missing both start_time and end_time. The remaining keys are 0.

I also checked session.getDrmaaImplementation() which for me translates to PSNC DRMAA for SLURM 1.0.7.

• Posts: 57Member

…And that's about as far as I can get. As far as I can see, there could be several reasons:

• misconfiguration of slurm-drmaa on the cluster(s)
• bug in slurm-drmaa
• bug in drmaa

but I'm probably not the right person to dig further. Sorry about that.

• Posts: 85Member ✭✭✭

I've been trying to use my extremely limited knowledge of C (ead non-existent) to look into the DRMAA SLURM implementation. Here is something that I found interesting (http://apps.man.poznan.pl/trac/slurm-drmaa/browser/trunk/slurm_drmaa/job.c):

case JOB_COMPLETE:
fsd_log_debug(("interpreting as DRMAA_PS_DONE"));
self->state = DRMAA_PS_DONE;
self->exit_status = job_info->job_array[0].exit_code;
fsd_log_debug(("exit_status = %d -> %d",self->exit_status, WEXITSTATUS(self->exit_status)));
break;


In the above it seems to me that the only thing being passed back once the job is complete is the exit status. So probably this is not a bug, but something which is not implemented in the library. I'll send an email to the maintainer of the DRMAA SLURM implementation as soon as I have the time, since he's been very helpful in my previous correspondence with him I'm sure he'll be able to shed some light on this situation.

• Posts: 57Member

Thanks @Johan_Dahlberg, looking forward to hearing how it unravels.

• Posts: 20Member

@Johan_Dahlberg, @dklevebring, @Geraldine_VdAuwera, I have the same problem. Any further information? Thanks.

• Posts: 20Member

@Johan_Dahlberg said: I'm still waiting to hear back from the maintainer of the slurm-drmaa libraries. So no new information at the moment I'm afraid.

Is it possible to generate report separately for each job and merge them after running?

• Posts: 57Member

Unfortuantely no, this bug/lack of implementation causes no info to be written even for single jobs. You're out of luck until that's fixed.

• Posts: 57Member

@Johan_Dahlberg‌ Still nothing from the slurm-drmaa maintainer?

• Posts: 85Member ✭✭✭

Yes. Sad to say, I've had no respons. What we need is someone with sufficient knowledge of C to fix this, do you have know someone who might be interested in helping @dklevebring‌?

• Posts: 51Member ✭✭

I have the same issue here, but I'm using pbs-drmaa-1.0.14

In previous GATK versions, however, I had at least a print of the type of jobs under the analysis names BQSRGatherer, Concat, LocusScatterFunction, ReadScatterFunction, SimpleTextGatherFunction, i.e. those running locally.

Now, I simply get a blank report.

• Posts: 57Member

@Johan_Dahlberg‌, I've been giving this some thought but have come up empty unfortunately. No word from the package maintainer I suppose? U reckon any of the admins at uppmax would be interested?

• Posts: 85Member ✭✭✭

Still no word from the maintainer, and I haven't really had the time to look into this. However, checking with the guys at Uppmax sounds like a good idea. Could you send them an e-mail, @dklevebring‌?