I've trying to implement the best practice workflow on my local Galaxy instance, but there is a problem says that "Unable to access /please/GenomicsAnalysis.jar", anyone knows how to fix it?
Which version of GATK are you using? Are you sure the path to the jar file is correct? What is the exact command you are running?
Hi, I'm using the 1.4 version, and i fixed the problem, it was the path that needed to be configured. By the way, I don't need to use command to run tools on Galaxy. Anyway, thanks for your help.
Please tell me you don't mean the GATK version is 1.4. That would be downright antique.
haha, the version is actually 2.8.0. I just assumed that because it's under the directory 1.4......
That's a bit better
Though we are on version 3.4, so if Galaxy isn't offering the latest version, you're missing out on some cool features. Can I ask why you choose to use GATK through Galaxy rather than using it directly from command line?
Sure, I'm a master student and my dissertation is about designing and implementing a workflow in Galaxy and compare it with the one using command line. You see, command line tools are not very intuitive, especially for those who lack of programming skills. I'm just trying to see if it is easier to use galaxy. But for now, using galaxy to implement the best practice workflow isn't the easy task I thought, configure the local galaxy instance requires a lot work as well, and to make GATK work on galaxy requires some configurations too, there has been problems here and there.
Is the tool for galaxy maintained by your team or team members? May I ask why the GATK team don't make it up to date?
The GATK team does not maintain Galaxy. If we did, we would make it up to date!
Interesting, thanks for your answer Marc. The difficulty for keeping GATK up to date in Galaxy is that someone has to generate wrappers for the tools every time a new version is released. I you're not familiar with this, wrappers are basically an extra layer of code to connect the interface with the underlying tools so that all the options are made available. At each new version you have to account for all tool and argument changes. When you're dealing with a fairly simple command line tool, that's not a lot of work, but for a toolkit like GATK that includes dozens of tools, each with many options, it's not a trivial undertaking.
As Sheila said, we don't produce the Galaxy wrappers for GATK ourselves -- that work is done by members of the core Galaxy team. We have tried to make it a little easier by producing structured definition files describing all tools and arguments in a way that would make it possible to automate the wrapper generation process, but that in itself is not trivial either. I think the Galaxy engineers jut haven't had time to get around to it yet. They have a lot on their plate.
What we are now working on on our end is a cloud-based platform that we're building in collaboration with Google to offer GATK-as-a-service -- the ultimate vision is that people will be able to simply run our predefined best practices pipelines on their data with minimal effort. Hopefully that will take a lot of pain out of the process for the significant number of users for whom it doesn't make sense to go through the effort of setting up their own pipelines.
Thank you both for your answers. May I ask what do you two think of the Galaxy platform? I've been implementing the pipeline in my local instance on a cloud virtual machine, and Galaxy has gave me a feeling that at least the local instance is not easy to configure as I thought it would be. And the GATK as a service, is it going to have an graphical interface or does it run in command line?
I haven't used Galaxy myself; I think Galaxy addresses a need in the community and as such can be a useful tool, but as you say, setting up your own instance is not trivial, so it's mostly useful as something made available by an institution or consortium to a research community (either internal or public).
The downsides are that Galaxy's need for constant updating of the wrappers is a huge weakness, especially in a field like ours where the software is still evolving rapidly and it's important to be able to take advantage of the latest features, bug fixes and so on.
One other comment is that while Galaxy makes sense to me as a pipelining system of sorts, I worry that it's too often used as a crutch to avoid using command line tools from the command line. I am very sympathetic to the aversion or anxiety that many researchers not trained in the use of command line tools feel on this topic, because I myself was one of them for many years (wetlab microbiologist, 10 years hard labor at the bench before I went to the dark side). But learning to use command line tools from the command line is a powerfully liberating experience and I wish everyone could have the opportunity to experience this. It is not as difficult or as complicated as it may seem if you find the right educational materials that fit how you think and work. The Software Carpentry organization does a fantastic job of organizing workshops that teach scientists to use command line tools, version control and all sorts of good stuff. I encourage you and anyone else in this sort of situation to check them out.