Workflow execution example on SGE

Hi there,

I was looking for a simple example on running a WDL script on a SGE cluster. I've been through the Cromwell and WDL documentation but I was unable to find one and I couldn't understand how to do it.

Any help on this would be appreciate it.

Thanks in advance.

Best regards,
Santiago

Tagged:

Best Answer

Answers

  • santiagorevalesantiagorevale ArgentinaMember

    Hi Chris,

    Thanks for your quick reply!

    Would just setting up the options under "SGE" be enough to start using Cromwell with that config file? Or is there any other option I should be setting from the file?

    Also, what is the aim of Cromwell's database? I've read that by default it uses an in-memory database which will only live for the duration of the JVM. So what would be it's goal and what would I be gaining if I set it up using MySql? By the way, will it work using SQLite?

    Thank you very much in advance for your help!

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    First things first, the configuration file:

    • First, you'll need to create your own configuration file (if you don't already have one)
    • Then, you'll want to create enough structure to copy in the SGE backend section, e.g.
    backend {
      default = "Local"
      providers {
        SGE {
          ...
          ...
        }
      }
    }
    
    • When you run Cromwell, you'll want to point at this new file:
    java -Dconfig.file=/path/to/config.conf -jar cromwell_XYZ.jar [run|server]
    

    Next, the database.

    • You're correct. By default Cromwell uses an in-memory HSQLDB for its database.
    • Cromwell also supports MySQL, you can get that by copying the appropriate database section from the bottom of the same reference.conf
    • I don't think SqLite will work, if I remember correctly it doesn't use a client/server model and so there's no way of getting jdbc connector for it.
    • The main benefit of the database is being able to preserve run information between server restarts. This means you can
      • Stop a running Cromwell and restart it, and Cromwell will be able to pick up still-running jobs.
      • Query metadata and timing information for historical runs.
      • Use call caching - that is, Cromwell is able to detect that you've already run an identical job before and not waste your time and money by re-running it. This happens at a call-level, and is biased to only cache hit if it is absolutely certain that it is an identical run to the previous one.

    Hope that helps!

  • santiagorevalesantiagorevale ArgentinaMember

    Thanks for your comprehensive reply, Chris. I'll give it a try and come back to you if anything goes wrong.

    Cheers!

  • andreytoandreyto Gaithersburg, MDMember

    Chris,
    Could you elaborate on how the call cashing behaves, especially when it is used on a local cluster. I run Cromwell with a Torque backend.

    My main concern is to make a specific workflow resilient to failures of:

    • master node
    • compute node
    • Torque job on a compute getting killed or failing due to transient errors

    In other workflow systems, this resilience is achieved by: - workflow engine restarts failed job up to a configured number of times; - workflow engine picks up the workflow state when the engine is restarted ("make" style - only run tasks that were not finished before).

    I saw someone submitting a pull request for automatic and optional job resubmissions, which was rejected on an account that resubmission would have to be intelligent in order to only resubmit on transient failures. As far as I can tell, resubmission is implemented for Google Cloud jobs, but not for local cluster.

    Picking up a workflow state on restart - how that works, assuming Cromwell is using MySQL? If some tasks have finished OK already, will they be recomputed? If not recomputing them depends on caching, then there is this question: For those tasks that already finished, do their outputs have to be present in the run directory in order for the cache hit to be detected? Ideally, as long as the inputs to a task that did not finish yet are present, it should not matter if all intermediate inputs and outputs that led to those inputs in the DAG are present or not. The need to keep all intermediate files in order to benefit from caching would dramatically increase storage use.

    Also, what happens if a Torque job is killed by Torque for running output wall clock limit? It appears that Cromwell monitors the creation of a job status file written by the job wrapper script. If the entire job is killed with signal 9, the status file will never get created. Will Cromwell wait forever?

    Thanks,
    Andrey

  • andreytoandreyto Gaithersburg, MDMember

    @ChrisL , could you please look at my question above? Maybe I should have started new thread instead - I do not know how posts get noticed. Thanks!

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    Hi @andreyto -

    I have an answer but that's a good idea, could you ask this as a new question? I think that'd be valuable for future visitors to the forum since it looks like a new question rather than a continuation of the previous one. Thanks!

  • Hannah66Hannah66 TaiwanMember

    Hi @ChrisL

    I'm trying to run workflows on SGE, java -Dconfig.file=./reference.conf -jar ./cromwell-28_2.jar run ./workflow.wdl ./pipeline_.json .
    But it's still not working to submit job by SGE. I would like to know how to make sure my setting main.wdl is correct. Do you have any examples such as tutorials to share us, https://github.com/broadinstitute/wdl/tree/develop/scripts/tutorials/wdl ?

    Many Thanks, Hannah

  • kshakirkshakir Broadie, Dev ✭✭

    Hi @Hannah66, can you submit this as a new question, so that future visitors of the forum can follow the separate discussion? Thanks!

Sign In or Register to comment.