Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

Installing GATK4 via Conda

jounikujjounikuj University of Eastern FinlandMember

Hi there! I have a small problem, or a suggestion for improvement, related to the use of (Mini)conda and GATK4. I'm not entirely sure if this forum is a right place to ask this because I don't really know how GATK4's Conda package is maintained, but let's give it a try!

So I'm using a wide variety of bioinformatic tools in my work which is why I prefer Conda in package management - just to make it little bit easier to handle package dependencies and package updates. I am now planning to try the new GATK4 as the version seems to be available in Bioconda. With GATK3 I was able to launch GATK simply with command 'gatk' so I naturally tried the very same command for GATK4. However;

gatk -h
bash: gatk: command not found
gatk4 -h
bash: gatk4: command not found

I located the GATK4 .jar file and succesfully tried the command;

java -jar /home/user/miniconda3/pkgs/gatk4- -h

This prints all available tools as excepted. So the main problem seems to be that shortcut to this .jar file is not included in the Conda distribution. Is there any particular reason for this behaviour or is this just a bug in the package? It is, of course, possible to use GATK4 with 'java -jar' command but the use of simple 'gatk' or 'gatk4' would be easier for Conda users. For example, if I update my GATK4 in the future I must also update my pipelines so that my paths are leading to the right .jar file. If I use direct 'gatk4' command, in turn, I can simply update GATK4 with Conda and launch it with 'gatk4' command in my pipeline - without manual path updating.

Thank you!


Issue · Github
by Sheila

Issue Number
Last Updated


  • SheilaSheila Broad InstituteMember, Broadie admin


    I am actually going to write a small tutorial on Conda. Let me get back to you soon with some helpful information after I find out more.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @jounikuj, in the meantime see the instructions from the github readme:

    GATK uses the Conda package manager to establish and manage the environment and dependencies required by these tools. The GATK Docker image comes with this environment pre-configured. In order to establish an environment suitable to run these tools outside of the Docker image, the conda gatkcondaenv.yml file is provided. To establish the conda environment locally, Conda must first be installed. Then, create the gatk environment by running the command conda env create -n gatk -f gatkcondaenv.yml (developers should run ./gradlew createPythonPackageArchive, followed by conda env create -n gatk -f scripts/gatkcondaenv.yml from within the root of the repository clone). To activate the environment once it has been created, run the command source activate gatk.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I should add -- we don't actually maintain the bioconda packages you were trying to use; they were created by someone else. We'll look into the possibility of contributing to their maintenance, but in the meantime we can't provide support for using them.

  • jounikujjounikuj University of Eastern FinlandMember
    edited February 2018

    @Geraldine_VdAuwera said:
    I should add -- we don't actually maintain the bioconda packages you were trying to use; they were created by someone else. We'll look into the possibility of contributing to their maintenance, but in the meantime we can't provide support for using them.

    Thank you Geraldine for this information, it's sometimes hard to figure out who is actually maintaining bioconda packages. It might be out of your scope but I highly recommend you to consider Conda as a distribution channel as well; as GATK already uses Conda to establish and manage the environment and dependencies it could be reasonable to make also an official GATK Conda package.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    I agree it makes sense and will bring this up with our release engineers. AFAIK it’s mainly a question of how we can integrate this into our current process.

    We also need to figure out who has been maintaining this so far; I see @bchapman has committed to the repository — Brad, any thoughts on this?

  • tstubertstuber Member

    When downloading GATK4 by conda it is called with gatk-launch . For example: gatk-launch CreateSequenceDictionary -R my_refer.fasta. It appears this will let one still call GATK3 through the usual conda download of just gatk, keeping them separate.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    That doesn't sound right @tstuber. The gatk-launch convention is specific to the beta releases and you should NOT use this. Rather, please be sure the launch script is callable with gatk, as this reflects the official release of GATK4. Please download GATK4 via the Download link at the top menu.

  • tstubertstuber Member

    Thanks for the info. I'm glad to know gatk-launch is beta release syntax. The current gatk4 version being used when calling gatk-launch is from the bioconda channel, the same version as available for download today (2/16/2018) from this website. Seems to be up-to-date. Possibly conda is keeping in "beta" to prevent breaking gatk3 updates.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    To be frank that sounds like a questionable way of maintaining backward compatibility — it’s going to lead to a lot of confusion like this. I would much prefer to see GATK4 provided as a separate package that follows our syntax. It truly is a separate software package so it would make more sense from a purist point of view anyway. We might just publish one ourselves.

  • arkanionarkanion SingaporeMember

    In the newest version of gatk4 from Bioconda repository, the executable is switched back to gatk from gatk-launch. FYI.

  • FPBarthelFPBarthel HoustonMember ✭✭

    Is there any plan to add to anacoda?

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @FPBarthel,

    The GATK is not involved in anaconda/bioconda packages. I am aware there is some lag with the versioned releases of GATK on bioconda but you'll have to ask the bioconda folks about their plans for

  • FPBarthelFPBarthel HoustonMember ✭✭
  • slamentslament SwedenMember

    Hello, I just wanted to add for any new user that I've been using GATK in conda and it has been working pretty well so far. I installed it like:

    $ conda install -c bioconda gatk4=

    And that's it :)

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @slament

    Thank you for helping out the community and Broad appreciates your contribution! :smile:

Sign In or Register to comment.