Heads up: Important update scheduled Feb 6 will disrupt running workflows and affect data sharing
We are planning to release an important update that will change how the system identifies user access permissions to Google buckets in order to support a technique that renders certain workflows substantially more cost-effective.
This update is scheduled for next Tuesday, February 6 around 4:00 pm EST. At that time, all running workflows will fail and will be automatically restarted using call-caching once the update is complete. We recommend you hold off on launching any long-running workflows until after the release if you want to avoid having your workflow stopped and restarted.
This update will break some functionality if you have data stored in private buckets that are NOT FireCloud workspace buckets. It also deprecates the use of private docker images in Docker Hub, which remain functional for now but will be disabled in 30+ days (exact date will be communicated separately). If you think either applies to you (or might apply to you in the future), please read on to learn more about what is going to change, why we are making this change and what you need to do to resolve any problems that arise for you.
If you're in a hurry, feel free to zip straight through to the last section, but I think you'll find the instructions will make more sense if you read the full explanation.
What is going to change?
Currently, when you launch a workflow submission, the workflow is run under your user identity, and the system uses a form of authorization credential called a refresh token, which is a sort of key, to unlock access to any cloud resources (data buckets etc.) authorized to your user identity. We store a copy of this key on every Virtual Machine (VM) instance that is recruited to run your workflow, and the key is used to retrieve whatever data files the task running on that VM needs from their respective Google buckets.
Starting next week, we will abandon these refresh tokens and switch to using service accounts for authorizing access to cloud resources. A service account is a special kind of account that belongs to the system and manages permissions on your behalf. The system will create a new service account for you for each FireCloud project that you have access to, and all of your service accounts will be collected into a proxy group. We call these service accounts pets because they follow you everywhere and open doors for you (what can I say, my cat is very skilled). When you launch a workflow submission, the workflow will be run under your pet's identity, and the system will determine whether your pet has access to the necessary resources by checking whether the proxy group it belongs to has been given access to those resources.
Why we are making this change?
I realize that what I just explained sounds way more convoluted than the refresh tokens business -- and indeed it is! That's why we originally used the tokens. Way simpler to set up. However, we found out recently that the tokens approach has some important limitations that prevent us from supporting a new technique for making workflows much more cost-effective.
As you may know (it's ok if you don't), when you run a task on a cloud VM, you normally have to localize the full data files involved to the VM's local storage (basically, the machine's hard drive). When you're working with very large files, for example with whole genome sequence, that ends up costing you time and money because you need to provision the VM with enough storage to hold them (the bigger the storage, the more expensive the machine) and wait for the full files to get transferred. Considering that in many cases you're only going to be operating on a small subset of the data (e.g. when you're scattering execution across genomic regions), this is wasteful and not awesome.
What is awesome, however, is that there's a new(ish) protocol called NIO that allows us to only retrieve the piece of the file we care about for a given task, which neatly blows away the localization problem and delivers substantial savings. See this YouTube video for a high-level explanation. For a rather impressive example of big-time savings enabled by this technique, keep an eye on the GATK blog, which will soon feature a blog post on that very topic. To be clear, the tool you're running has to be equipped to use NIO in order to make use of this, like GATK4 -- it's not purely a WDL feature.
The catch is that with NIO, you're essentially leaving the data transfer up to the tool you're running inside a given task, instead of relying on FireCloud to arrange it. By doing so, you're bypassing FireCloud's authorization protocols; the tool cannot see or use the FireCloud-managed tokens. So in the current state of the system, your tool can't get access to the data that it's supposed to retrieve. Counterproductive? Yes, quite.
On the bright side it doesn't cost you any money since nothing runs. Ahem.
Fast-forward to next week when the switch to pet service accounts is made: now the tool is running under your pet's identity, so access authorization will simply be based on whether your pet belongs to the right clique (i.e. proxy group). No more problems, NIO works like a charm, and you save a ton of green*. Woo.
*Color may vary based on your local currency
What YOU need to do
If you only use data that is either fully public (world-readable) or stored in FireCloud workspace buckets, and if you only use public docker images, you're golden. All you need to do is avoid launching any long-running workflows until after the update if you don't want to run the risk of them being stopped and restarted.
If you use data that is stored outside of FireCloud workspace buckets and is not fully public (world-readable), you may need to modify the access permissions of that data. If the data is already shared with a FireCloud group that you or your collaborators are a part of, no action will be required because that type of group already uses proxy groups for authorization purposes. Otherwise, you will need to share this data with either your (and/or your collaborators') proxy group OR with a FireCloud group that you (and/or your collaborators) are a part of. If you find yourself needing to do this for several people, we strongly recommend you use the FireCloud group option. You can find your proxy group information, in the form of
[email protected], on your profile page in the FireCloud portal.
If you use private docker images stored in Docker Hub in your methods, you should plan to make alternative arrangements ahead of their eventual disabling (exact date TBD). You can either make your docker images public on Docker Hub OR use the Google Container Registry (GCR) instead to host your private images. For background information about containers, docker, and GCR, read this Dictionary article. We are preparing some documentation summarizing the basics of how to publish images to the major docker repositories (Docker Hub, GCR and Dockstore). In the meantime, see this external Google document to learn how to publish your docker image in GCR.
If you experience any difficulties due to this change, or have trouble making the necessary adjustments, let us know in the comments. We will prioritize resolution of any issues related to this update.