Attention:
The frontline support team will be unavailable to answer questions until May27th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

submit workflows.wdl on SGE for tutorials

Hi there,
I'm trying to run workflows on SGE, java -Dconfig.file=./reference.conf -jar ./cromwell-28_2.jar run ./workflow.wdl ./pipeline_.json .
But it's still not working to submit job by SGE. I would like to know how to make sure my setting main.wdl is correct. Do you have any examples such as tutorials to share us, https://github.com/broadinstitute/wdl/tree/develop/scripts/tutorials/wdl ?

Many Thanks, Hannah

Best Answer

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    Hi @Hannah66 -

    • You shouldn't need to use the entire reference.conf, it should be enough to follow the customization instructions here.
    • If you have a backend.providers.SGE block in your configuration file, is your backend.default set to SGE? Otherwise Cromwell will continue to default to local execution.
    • The WDL should be agnostic of where it gets run, so long as Cromwell is configured correctly, the backend should be taken care of for you.

    Hope that helps!

  • Hannah66Hannah66 TaiwanMember

    Hi @ChrisL ~
    Thank you for your information. I'm still confusing how can I call SGE in my main.wdl. Would you please give me some examples?

    For this is my current main.wdl example,
    workflow Tutorial {
    File GATK
    File input_bam
    String tmp
    File ref_fasta
    File ref_dict
    File ref_alt
    File ref_amb
    File ref_ann
    File ref_bwt
    File ref_pac
    File ref_sa
    File ref_fai
    Array[String] chrs

    scatter (chr in chrs) {
    call hc_full_bam_chrs.hc_full_bam_chrs {

         input:
         GATK = GATK,
         input_bam = input_bam,
         tmp = tmp,
         ref_fasta = ref_fasta,
         ref_dict = ref_dict,
         ref_alt = ref_alt,
         ref_amb = ref_amb,
         ref_ann = ref_ann,
         ref_bwt = ref_bwt,
         ref_pac = ref_pac,
         ref_sa = ref_sa,
         ref_fai = ref_fai,
         chr = chr
    

    }
    }
    }

    Bests, Hannah

  • Hannah66Hannah66 TaiwanMember

    Hi @kshakir ,
    Thanks your information. I have other question about cpu and memory issues. I have different jobs use different cpu and memory. If I set my cpu & memory parameter in sge.conf, Could I use hello.wdl to change the cpu/memory setting? Or I need to change my sge.conf every time to submit different jobs?

    Many Thanks, Hannah

  • Hannah66Hannah66 TaiwanMember

    Hi @kshakir , I guess I could use "runtime { memory: ${memory}}" to control :) Thank you for your helping~

    Many Thanks, Hannah

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    @Hannah66 that's exactly right! If you set them in the runtime attributes then cromwell includes them in the SGE submission command (as specified in the configuration file)

    I hope you managed to get it all working!

  • Hannah66Hannah66 TaiwanMember

    Hi @ChrisL @kshakir ,
    I have a question about sge.conf. If I add "-hold_jid" for qsub. However, how could I catch my ${ex_job_name} ?
    I know the job name is the string: cromwell__. If I submit new main.wdl. I have to find ${ex_job_name} to make sure these workflows are follow by order. Do you have any examples ?

    My command line:
    order1: java -Dconfig.file=./sge.conf -jar cromwell-28_2.jar run ./hello.wdl ./hello.json
    order2: java -Dconfig.file=./sge.conf -jar cromwell-28_2.jar run ./hello2.wdl ./hello2.json

    For sge.conf example,

    submit = """
    qsub \
    -terse \
    -V \
    -b n \
    -N ${job_name} \
    -hold_jid ${ex_job_name} \
    -wd ${cwd} \
    -o ${out} \
    -e ${err} \
    -pe smp ${cpu} \
    ${"-l h_vmem=" + memory_gb + "g"} \
    ${"-q " + sge_queue} \
    ${"-P " + sge_project} \
    ${script}
    """

    Many Thanks, Hannah

  • ThibThib CambridgeMember, Broadie, Dev ✭✭

    Hi @Hannah66 ,

    I'm not very familiar with SGE but if I understand your use case you're trying to use hold_jid to set dependency between 2 workflows ?
    Have you considered using SubWorkflows instead ?

    Thibault

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    @Hannah66 I'm afraid I don't know SGE or qsub in detail. Could I ask for a little more context?

    • What is it that the -hold_jid option you want to supply would do?
    • Is that something that's in the Cromwell docs already or a new SGE option you want to add?
    • What do you mean by "catch my ${ex_job_name}"?
    • Are you aware that Cromwell's main job is to organizes the task ordering for you? You shouldn't have to do that yourself manually!
  • Hannah66Hannah66 TaiwanMember

    Hi~
    I would like to control my cpu and memory for each steps.wdl. It only supports setting cpu/memory on runtime in JES & SGE not in local, https://software.broadinstitute.org/wdl/documentation/topic.php?name=wdl-components
    I know Cromwell's main job is to organizes the task ordering for me. But I have to limit my resource such as cpu/memory.
    I'm trying to use "-hold_jid" to set dependency between 2 workflows because they use different cpu/memory.
    If I use "-hold_jid" I need to know the name of first job and waiting if the first job is done I could run second job. So I need to know how to get my first job name. Do you have any ideas?

    Many Thanks, Hannah Lin

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev admin

    You are correct, only JES and SGE will take notice of the cpu and memory runtime options. I'm confused because I thought that you are submitting to the SGE backend? Otherwise how would you ever get a jid to hold on?

    Well, never mind, if you want to limit how many jobs Cromwell is running concurrently then a neater way to do this is to use Cromwell's built-in concurrent-job-limit option in your backend config eg:

    backend {
      default="Local"
      Local {
        ...
        config {
          concurrent-job-limit = 2 # Only allow two jobs to be run at any one time
        }
      }
    }
    

    This option lets you say "when I'm using this backend, I only want 2 jobs to be running concurrently"

  • Hannah66Hannah66 TaiwanMember

    Hi @kshakir ,
    I have a question about "-hold_jid" with SGE. If I add on sge.conf, how could I catch my previous job name from "-hold_jid". Do you have any ideas?

    submit = """
    qsub \
    -terse \
    -V \
    -b n \
    -N ${job_name} \
    -hold_jid ${previous_job_name} \
    -wd ${cwd} \
    -o ${out} \
    -e ${err} \
    -pe smp ${cpu} \
    ${"-l h_vmem=" + memory_gb + "g"} \
    ${"-q " + sge_queue} \
    ${"-P " + sge_project} \
    ${script}
    """
    

    Many Thanks, Hannah

  • kshakirkshakir Broadie, Dev ✭✭

    Hi @Hannah66,

    I haven't used -hold_jid with grid engine before. But looking at google, it appears that command line option specifies the dependencies between jobs inside grid engine. This seems like it would be redundant in the current versions of cromwell, as part of cromwell's duty is to also track job dependencies. If call B depends on call A's outputs, then B will not be sent to grid engine until A completes.

    Pre-scheduling dependencies-- on any of the backends-- is not currently supported or even well thought out for cromwell. If you'd like to discuss ways that cromwell could potentially submit job dependencies ahead of time, please start a new forum post and the discussion could maybe continue there.

  • blueskypyblueskypy Member ✭✭
    edited December 2018

    hi, @kshakir @ChrisL
    for SGE backend, where should the values of those variables in qsub be supplied, such as ${job_name}, ${cwd}, ${out}, etc?

    If I don't want to list them in the runtime attributes block of the wdl scripts , can I provide the values in --options xxx.json?

    Also what is the content of ${script}?

    Thanks!

    Post edited by blueskypy on
  • blueskypyblueskypy Member ✭✭

    based on my own trial, job_name cannot be supplied by users. I added job_name in the WDL task runtime block, but got error “Key/s [job_name] is/are not supported by backend. Unsupported attributes will not be part of job executions.” from Cromwell run.

  • jmfuntjmfunt CambridgeMember

    I know that this is supported, but I'm hoping to get some insight as to what i'm not doing correctly. I'd like SGE to deploy containers. I see from the documentation provided here:
    https://cromwell.readthedocs.io/en/stable/backends/SGE/
    That this can work.

    But for my configuration:
    submit-docker = """
    qsub \
    -terse \
    -V \
    -b n \
    -N ${job_name} \
    -wd ${cwd} \
    -o ${out} \
    -e ${err} \
    -pe smp ${cpu} \
    -l docker,docker_images="${docker}"
    -xdv ${cwd}:${docker_cwd}
    ${script}
    """

    The error is:
    Caused by: wdl.draft2.parser.WdlParser$SyntaxError: ERROR: Variable docker does not reference any declaration in the task (line 35, col 36): -l docker,docker_images="${docker}"
    and this error is not wrong.

    When looking at the stack trace:

    ```task submit_docker {

    String job_id
    String job_name
    String cwd
    String out
    String err
    String script
    String job_shell

    String docker_cwd
    String docker_cid
    String docker_script
    String docker_out
    String docker_err```

    you can see that, contrary to the readthedocs link, it does not seem that docker is defined as a runtime attribute. Any insight as to what I'm overlooking?

Sign In or Register to comment.