Ever wish you could automatically remove your unwanted output files from a submission without having to manually review them? If so, take this two minute survey and tell us more.
Latest Release: 1/17/19
Release Notes can be found here.

Troubleshooting in FireCloud

FireCloud_TeamFireCloud_Team Moderator
edited August 2018 in Tutorials

In this document we'll go over some basic strategies to investigate failed workflows on FireCloud. This isn’t a guide for solving all errors but a doc to instruct new users on how to diagnose failed submissions. Descriptions of more complicated errors are always welcomed on the FireCloud Forum where our team is happy to help.

Checking Log Messages

At this point we will assume that you have a workspace setup with a data model and method configuration loaded. You’ve launched your method, but your submission has failed. Don’t despair! There is some information you can gather that will be helpful in getting your method up and running.
In your workspace, go to the Monitor tab and click View to visit the submission that failed. Then click View again in the data entity column to see details about the submission.

Click “Show” located to the the right of Failures, this will display any error messages. The message listed under Failures isn't always short and sweet, and, if interpreted incorrectly, will lead you down the wrong debugging path. Instead, use the message to identify and investigate which task failed. In this case, the failed task is Hello_GATK.HaplotypeCaller_GVCF .

Click Show on the task that failed to gain access to three very useful files for debugging: stderr, stdout, JES log. These files are generated by Cromwell when executing any task and are placed in the task's folder along with its output. In FireCloud we add quick links to these files in the Monitor tab to make troubleshooting easier.

  • Standard Out (stdout): A file containing log outputs generated by commands in the task. Not all commands generate log outputs and so this file may be empty.
  • Standard Error (stderr): A file containing error messages produced by the commands executed in the task. A good place to start for a failed task.
  • JES log: A log file tracking the events that occurred in performing the task such as downloading docker, localizing files, etc.. Occasionally a workflow will fail without a stderr and stdout files, leaving you with only a JES log. More on this on the next section.

Many common task level errors are indicated in the stderr file. Click on the link to the stderr.log file, in this example it would be Haplotypecaller_GVCF-stderr.log and a window will appear giving you a glimpse of the file.

Here we see an error produced by the HaplotypeCaller command in our task. The message indicates the index file for the FASTA reference does not exist. Hmm, its seems there's something wrong with the FASTA index we provided. Click Done to go back to the previous screen. Now we’ll check the inputs that were provided to the task by clicking Show for the Inputs.

Ah-ha, the reference index file we provided (InputBamIndex) is the index file for our sample (NA12878.bai), but instead it should be the index for reference hg38 (Homo_sapiens_assembly38.fasta.fai). After correcting my method configuration to use the right input index, my workflow passes!

JESLog

Often the JESLog is difficult to decipher so its better to proceed to the other log files. However, in some cases your submitted job will fail with no stderr or stdout files. In these cases you’ll have to suck it up and unravel the meaning behind the JESLog messages. Below we’ve provided some common JESLog errors and their possible meaning as aid. There isn’t a solution for all of them so feel free to post your error on the FireCloud forum so the team could help you through the message.

  • PAPI (JES) Error 10 Message 15

    • The provided memory or disk space isn’t sufficient to load your input files and Docker container, thus the task instance exited abruptly. Increasing the disk space should avoid the error message.
  • PAPI (JES) Error 10 Message 14

    • This error is associated with preemptible VMs, but we've observed that it occasionally applies when non-preemptible machines fail. Retry the workflow. If you see this particular error on the same task repeatedly, we recommend using the runtime attribute maxRetries to retry transient failures.
  • PAPI Error 10 Message 13

    • The submission has been aborted with many possible reasons, sometimes it's due to preemption.
  • PAPI Error 5: Message 10

    • This message does not mean that the files failed to delocalize, but rather that something went wrong upstream and the files it is trying to delocalize were never generated. Check the stderr or stdout for possible reasons why the files were not generated.
  • PAPI Error 2

    • The submission could not start possibly due to preemption. There may be a description indicating the reason in the error message.
  • Cannot find credentials for RawlsUser(RawlsUserSubjectId(******),RawlsUserEmail(******))
    Refresh your browser window, and you will see a yellow banner at the top of the page that says "Your offline credentials are missing or out-of-date. Your workflows may not run correctly until they have been refreshed. Refresh now...", click the link to "Refresh now..." and follow the prompts. This will update your credentials in the system and should make this error message go away. This prompt is needed in order to maintain compliance with certain security standards required for hosting sensitive data.

Summary

  • The error examples discussed above are pretty simple and very common. Be sure to check your inputs before launching them to avoid failed workflows.
  • Remember that when troubleshooting, you should automatically head towards to the Monitor tab and check stdout,stderr, and JES log for your failed task.
  • In cases where there isn’t a stdout or stderr file, use the common JES log and the message explanations in this document to help you solve the problem.
  • Of course, if you are having any trouble with FireCloud troubleshooting, you can ask your question on the FireCloud forum.

Additional Links:
FireCloud Forum
Quick Start Guide
FireCloud Tutorials
FireCloud FAQ

Post edited by bshifaw on
Sign In or Register to comment.