We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Can I use different versions of the GATK at different steps of my analysis?

edited December 2017 in Frequently Asked Questions

Short answer: NO, unless it's a combo that we made.

Medium answer: No, at least not if you want to run a low-risk pipeline, unless it's a workflow that we validated explicitly and published in Github.

Long answer: see below for details.

The rationale

There are several reasons why you might want to do this: you're using the latest version of GATK and one of the tools has a show-stopping bug, so you'd like to use an older, pre-bug version of that tool, but still use the latest version of all the other tools; or maybe you've been using an older version of GATK and you'd like to use a new tool, but keep using the rest in the version that you've been using to process hundreds of samples already. In fact, our current production pipeline for germline short variant calling does exactly that. The point is, we vetted the combination of versions in question extensively until we were satisfied that it was safe.

The problem: compatibility is not guaranteed

In many cases, when we modify one tool in the GATK, we need to make adjustments to other tools that interact either directly or indirectly with the data consumed or produced by the upgraded tool. If you mix and match tools from different versions of GATK, you risk running into compatibility issues. For example, GenotypeGVCFs has certain expectations about the GVCFs it gets from HaplotypeCaller. If the information is formatted differently than what the GGVCFs expects, it can blow up -- or worse, do the wrong thing but not tell you there's a problem.

But what if the tools/tasks are in unrelated workflows? What if the tool hasn't been ported to GATK4?

Well, that might be fine, but we encourage you to avoid it if you can. If you have any doubt, feel free to ask us -- we may be able to suggest alternatives.

The conclusion

We don't support using tools from different versions within the same workflow unless we've vetted the combination. We don't think it's worth the risks. If there's a show-stopping bug, let us know and we promise to fix it as soon as (humanly) possible. For the rest, either accept that you're stuck with the version you started your study with (we may be able to help with workarounds for known issues), or upgrade your entire workflow and start your analysis from scratch. Depending on how far along you are one of those options will be less painful to you; go with that.

The plea bargain, and a warning

If despite our dire warnings you're still going to mix and match tool versions, fine, we can't stop you. But be really careful, and check every version release notes document ever. And keep in mind that when things go wrong, we will deny you support if we think you've been reckless.



  • tytolintytolin Member

    Hello, GATK

    I know that it is inappropriate to use different Version of GATK at the same time.
    However, some of the argument like --concordant in SelectVariant is removed or changed to another argument.
    Shall I use former version of GATK while calling the concordance of two VCF files from different VCF caller?


Sign In or Register to comment.