Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

AWS Batch - how to use pre-existing job definition or specify docker container

The documentation for using Cromwell with AWS Batch does not tell me how to use either a pre-existing job definition or how to specify the docker image used. If I run the demo, it creates a job definition using the ubuntu:latest. How do I specify a different docker image?

Tagged:

Answers

  • dtenenbadtenenba Member

    I also need to know how to specify an instance role so that the job can have access to the S3 bucket that I have specified.

  • dtenenbadtenenba Member

    OK, I figured out how to set the docker image. I do that in my WDL file (I assumed it was done in the AWS configuration file).

    But I still need to know how to set a job role ARN so that my job can access the S3 bucket defined in aws.conf. I see some reference to "assume_role" and "base-auth" and "role-arn". Not sure what "base-auth" should be and I don't know what the syntax for this should be in the conf file. An example would be wonderful.

    Thanks.

  • wleepangwleepang Member

    Currently, jobs submitted from Cromwell do not use job roles. Access to S3 in this case would be managed via the InstanceProfile that is attached to instances launched from Batch. You would do this from the IAM console.

    If you use the "All-in-One" CloudFormation template provided at:

    docs.opendata.aws/genomics-workflows/cromwell/cromwell-aws-batch
    

    everything will be preconfigured and running.

  • dtenenbadtenenba Member

    Hi Lee, nice to hear from you (this is Dan Tenenbaum).

    Currently at my institution each PI has their own S3 bucket but they all share a single AWS account (we are in the process of changing this, but that's how it is for now). So we would like each user to use only their PI's bucket.

    I was actually able to get a job to work with a different bucket than the one that's baked into the config (created by CloudFormation), by adding the following to my aws.conf:

    # in auths section:
        {
            name = "assume-role-based-on-another"
            scheme = "assume_role"
            base-auth = "default"
            role-arn = "arn:aws:iam::<ACCOUNT_NUMBER>:role/<NAME_OF_ROLE>"
        }    
    
    

    Then changing root to the bucket associated with role-arn, and changing engine.filesystems.s3.auth to assume-role-based-on-another.

  • wleepangwleepang Member

    Hi Dan! Had an hunch it was you from your handle.

    Glad to hear you've got something working. I'm curious, is your use case to have a persistent and shared Cromwell server? Would it work for your users to have an ephemeral one that is configured to use their PI's bucket on the fly?

  • dtenenbadtenenba Member

    We haven't thought it through that far. At present we're just using the command line and not the server. The problem with having a persistent and shared cromwell server is that 1) there is no auth on the server (that's not really a problem, we can add that) and 2) there's no way to track which user submitted a job to the server, but we really need to know that. Is this something that may be on the roadmap, do you know?

    Also, how would it work to have an ephemeral server that's configured to use the user's PI bucket on the fly?

    Thanks.

  • wleepangwleepang Member

    You can launch an EC2 instance with Cromwell installed and appropriately configured using the following cloudformation template either through the console or from the command line:

    docs.opendata.aws/genomics-workflows/cromwell/cromwell-aws-batch/#cromwell-server
    

    As for tracking who submits jobs, I'm not sure how that would be achieved.

Sign In or Register to comment.