Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Using genomes in the cloud with cromwell

We are trying to modify our pipelines to run with docker, specifically the genomes in the cloud image (https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/)

After some experimenting I figured out that to run picard from the image, I need to run docker run broadinstitute/genomes-in-the-cloud:2.3-1498756809 java -jar picard.jar from the command line. However, I can't get picard to work with wdl/cromwell

task reference {                                                                                                                                     
    command {java -jar picard.jar}
    runtime {docker: "broadinstitute/genomes-in-the-cloud:2.3-1498756809"}
    output {File out = stdout()}                                                                                                  
$ cat cromwell-executions/variantDiscovery/811e2ef6-cccd-4239-855d-759d93a68db7/call-reference/execution/stderr
Error: Unable to access jarfile picard.jar

I looked at the script.submit file, and I noticed that cromwell uses docker <...> /bin/bash script instead of docker <...> script, which I think is related to the problem. I've tried running docker run broadinstitute/genomes-in-the-cloud:2.3-1498756809 /bin/bash true or any other command from the command line, but I keep getting /bin/true: /bin/true: cannot execute binary file.

Any suggestions on how to solve this would be appreciated.

Best Answer


  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev admin

    One thing missing from my previous comment, Cromwell creates a wrapper script for every command to do some bookkeeping and final moving around of files, as well as running the command that you ask for. It's that script that the docker image runs, not your command directly.

  • @ChrisL
    Thanks for your help, I got it working after specifying the path to the jar in the docker image like you suggested.

    As a reference, this is the working task I ended up with

    task reference {
        File ref
        String basename = sub(ref, "^/.*/","")
        String refname  = sub(basename, "\\.fasta$", "")
        command {
            cp "${ref}" "${basename}"
            java -jar /usr/gitc/picard.jar CreateSequenceDictionary R="${basename}" O="${refname}.dict"
            /usr/local/bin/samtools faidx "${basename}"
            /usr/gitc/bwa index "${basename}"
        runtime {
            docker: "broadinstitute/genomes-in-the-cloud:2.3-1498756809"
        output {
            File out = stdout()
  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev admin

    That's great! And thanks for posting your successful task so that others can use it as a reference!

    One more comment, if you're using Cromwell 27 or above, you might like to check out this new 'basename' function that should make your basename variables a bit easier to read:

Sign In or Register to comment.