Installing GATK4 via Conda

jounikujjounikuj University of Eastern FinlandMember

Hi there! I have a small problem, or a suggestion for improvement, related to the use of (Mini)conda and GATK4. I'm not entirely sure if this forum is a right place to ask this because I don't really know how GATK4's Conda package is maintained, but let's give it a try!

So I'm using a wide variety of bioinformatic tools in my work which is why I prefer Conda in package management - just to make it little bit easier to handle package dependencies and package updates. I am now planning to try the new GATK4 as the version 4.0.1.1 seems to be available in Bioconda. With GATK3 I was able to launch GATK simply with command 'gatk' so I naturally tried the very same command for GATK4. However;

gatk -h
bash: gatk: command not found
gatk4 -h
bash: gatk4: command not found

I located the GATK4 .jar file and succesfully tried the command;

java -jar /home/user/miniconda3/pkgs/gatk4-4.0.1.1-py36/share/gatk4-4.0.0.1-0/gatk-package-4-0.0.1-local.jar -h

This prints all available tools as excepted. So the main problem seems to be that shortcut to this .jar file is not included in the Conda distribution. Is there any particular reason for this behaviour or is this just a bug in the package? It is, of course, possible to use GATK4 with 'java -jar' command but the use of simple 'gatk' or 'gatk4' would be easier for Conda users. For example, if I update my GATK4 in the future I must also update my pipelines so that my paths are leading to the right .jar file. If I use direct 'gatk4' command, in turn, I can simply update GATK4 with Conda and launch it with 'gatk4' command in my pipeline - without manual path updating.

Thank you!

Tagged:

Issue · Github
by Sheila

Issue Number
2900
State
open
Last Updated
Assignee
Array

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @jounikuj
    Hi,

    I am actually going to write a small tutorial on Conda. Let me get back to you soon with some helpful information after I find out more.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi @jounikuj, in the meantime see the instructions from the github readme:

    GATK uses the Conda package manager to establish and manage the environment and dependencies required by these tools. The GATK Docker image comes with this environment pre-configured. In order to establish an environment suitable to run these tools outside of the Docker image, the conda gatkcondaenv.yml file is provided. To establish the conda environment locally, Conda must first be installed. Then, create the gatk environment by running the command conda env create -n gatk -f gatkcondaenv.yml (developers should run ./gradlew createPythonPackageArchive, followed by conda env create -n gatk -f scripts/gatkcondaenv.yml from within the root of the repository clone). To activate the environment once it has been created, run the command source activate gatk.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I should add -- we don't actually maintain the bioconda packages you were trying to use; they were created by someone else. We'll look into the possibility of contributing to their maintenance, but in the meantime we can't provide support for using them.

  • jounikujjounikuj University of Eastern FinlandMember
    edited February 16

    @Geraldine_VdAuwera said:
    I should add -- we don't actually maintain the bioconda packages you were trying to use; they were created by someone else. We'll look into the possibility of contributing to their maintenance, but in the meantime we can't provide support for using them.

    Thank you Geraldine for this information, it's sometimes hard to figure out who is actually maintaining bioconda packages. It might be out of your scope but I highly recommend you to consider Conda as a distribution channel as well; as GATK already uses Conda to establish and manage the environment and dependencies it could be reasonable to make also an official GATK Conda package.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    I agree it makes sense and will bring this up with our release engineers. AFAIK it’s mainly a question of how we can integrate this into our current process.

    We also need to figure out who has been maintaining this so far; I see @bchapman has committed to the repository — Brad, any thoughts on this?

  • When downloading GATK4 by conda it is called with gatk-launch . For example: gatk-launch CreateSequenceDictionary -R my_refer.fasta. It appears this will let one still call GATK3 through the usual conda download of just gatk, keeping them separate.

  • shleeshlee CambridgeMember, Broadie, Moderator

    That doesn't sound right @tstuber. The gatk-launch convention is specific to the beta releases and you should NOT use this. Rather, please be sure the launch script is callable with gatk, as this reflects the official release of GATK4. Please download GATK4 via the Download link at the top menu.

  • Thanks for the info. I'm glad to know gatk-launch is beta release syntax. The current gatk4 version being used when calling gatk-launch is 4.0.1.2 from the bioconda channel, the same version as available for download today (2/16/2018) from this website. Seems to be up-to-date. Possibly conda is keeping in "beta" to prevent breaking gatk3 updates.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    To be frank that sounds like a questionable way of maintaining backward compatibility — it’s going to lead to a lot of confusion like this. I would much prefer to see GATK4 provided as a separate package that follows our syntax. It truly is a separate software package so it would make more sense from a purist point of view anyway. We might just publish one ourselves.

  • arkanionarkanion SingaporeMember

    In the newest version of gatk4 from Bioconda repository, the executable is switched back to gatk from gatk-launch. FYI.

Sign In or Register to comment.