Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

Using Docker Content Trust with Cromwell

Hi all,

I host the docker images I use for my cromwell analsys on docker hub. I want to make sure that I can trust the images that I pull from docker hub. (To protect against man in the middle attacks, compromised docker hub account, etc). Luckily, docker has something called Content Trust which enables you to cryptographically sign your images. However, this only works when you work with tags (ie docker pull broadinstitute/genomes-in-the-cloud:2.3-1498756809), but cromwell uses hashes internally.

From a random script.submit:

docker run --rm  -v <path> -i broadinstitute/[email protected]:e36609f714e301ee40c632b62422d

Is there a way to use Content Trust with Cromwell? The ability to verify the authenticity of each step of our analysis is very important to us.

Tagged:

Answers

  • danbdanb Member, Broadie ✭✭✭

    Hm, tags should work as intended. Which version of cromwell are you using?

  • danbdanb Member, Broadie ✭✭✭

    I misunderstood the issue. Indeed we do not currently support this Content Trust tagging mechanism and currently don't have plans to do so.

    I believe you could use the hash directly to workaround this issue.

  • @danb
    Can you elaborate on using the hash directly? According to the docker website, Content Trust is based on tags, and does not work when using the hash directly.
    "with content trust enabled a docker pull someimage:latest only succeeds if someimage:latest is signed. However, an operation with an explicit content hash always succeeds as long as the hash exists"

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev

    Hi @Redmar_van_den_Berg, I'm not totally sure you should be using Content Trust to solve this problem. I've only read a little about Content Trust, but that seems to allow tags to move relative to images which would compromise scientific reproducibility. We generally recommend specifying Docker images by SHA digest for reproducibility, and I believe this should also address your concerns about integrity.

  • hi @mcovarr. I understand the argument to use the SHA digest for scientific reproducibility, since the hash can never change, but the content behind a tag might change. Using the SHA digest solves this problem.
    Content Trust is about verifying that any given tag was actually pushed by the legitimate admin of that repository. If I gain access to the broad docker account and push a modified image, the use of the SHA digest by cromwell does not protect against this. Docker will simply look up the SHA digest of the malicious image, and cromwell will use that to pull the image and run the analysis.
    By using the SHA digest, you can be sure that you are using the same image as before. By using Content Trust, you can be sure that the image was created and uploaded by the legitimate administrator of that repository.

    Would it not be better if cromwell used the tag for pulling and running a docker image, and only used the SHA digest internally for things like call caching? That way, users that want to use the SHA digest can simply set it in the runtime options instead of the tag. And users that want to use Content Trust can use the tags.

    If I run the same analysis on two different computers two days apart, the fact that cromwell uses the SHA digest behind the scenes does not help me if the admin changes the content of the docker tag in between the two analysis. In both cases, cromwell will simply fetch the image from docker, check the SHA digest of the different images, and then use the different SHA digests the run the analysis. So even with the SHA digest, I will get two different results from running the same analysis.

  • mcovarrmcovarr Cambridge, MAMember, Broadie, Dev

    Hi @Redmar_van_den_Berg

    Cromwell currently tries to guarantee that all references to a given Docker tag within a workflow result in the same Docker image being used for each reference. This is not compatible with pulling images by tag since the image corresponding to a tag can change while the workflow is running.

    The recommended practice is to specify images by hash. Two different analyses run days apart will get the same result if Docker images are specified by hash and not by tag. Cromwell allows for symbolic representation of Docker images in WDL through expressions.

    In workflow inputs:

    {
      "w.docker_tag": "[email protected]:34471448724419596ca4e890496d375801de21b0e67b81a77fd6155ce001edad"
    }
    

    In WDL:

    task t {
      String docker_tag
      command {
       echo "Pulling Docker image with tag ${docker_tag}"
      }
      runtime {
        docker: "${docker_tag}"
      }
    }
    
    workflow w {
      String docker_tag
      call t { input: docker_tag = docker_tag }
    }
    

    If you still really want to use Docker Content Trust, we already have an enhancement request to be able to turn off Docker hash lookups which should allow you to do this, but turning off hash lookups will also disable call caching. If call caching doesn't matter to you I can file a follow-on enhancement request to support Docker Content Trust.

    https://github.com/broadinstitute/cromwell/issues/2600

Sign In or Register to comment.