Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Jar caching and enabling it

shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
edited August 2017 in Dictionary

When running GATK4 Spark jobs, we see in the standard output a message about caching the jar file.

Using GATK jar /Applications/genomicstools/gatk/gatk-4.latest/gatk-package-4.beta.2-spark.jar
jar caching is disabled because GATK_GCS_STAGING is not set

please set GATK_GCS_STAGING to a bucket you have write access too in order to enable jar caching
add the following line to you .bashrc or equivalent startup script

    export GATK_GCS_STAGING=gs://<my_bucket>/

Instead of uploading the local jar each time you run a command, it is possible to cache the jar in a run to a cloud bucket. To enable this, you will have to export, i.e. add to your bashrc, GATK_GCS_STAGING=gs://your_bucket_name/some_folder_name/. Gatk-launch will check if the jar you are invoking matches the jar in this folder and, if they match, will copy this jar from google storage to the google cluster instead of your local system. Here is an example export command.

export GATK_GCS_STAGING=gs://spacecade7/gatk4/

Uploads to Google cloud buckets are free though. So this feature is advantageous for situations in which you have limited or slow network connections. And we know uploads are generally much slower than downloads.

Remember to include the forward slash at the end of the bucket path and use a dedicated directory, e.g. gatk4. Because each GATK4 release’s jar will have a different identifying hash, as you upgrade to each latest release, different versioned jars will start to accumulate and otherwise litter your bucket.

Sign In or Register to comment.