The current GATK version is 3.2-2

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

# Upcoming changes to the license & the retirement of GATK Lite by v 2.4

edited September 2

Note: the license has been updated for GATK 3.x. See details here.

If you never got the point of GATK Lite and you hated the 2.0 license... Oh, do we have good news for you!

First, a little bit of context. When we released GATK 2.0, the GATK had emerged as the leading research software package in its domain. Public demand for tech support was rising rapidly; not only from the academic research community as it had in the past, but also from researchers using the software in a for-profit context. These latter users have specific needs (quality assurance, process certifications, etc.) that we are ill-equipped to address.

This drove us to seek a partnership with a company called Appistry which could release and license the GATK as a commercial software product appropriate for use in a for-profit and regulatory-compliant setting. We knew this solution would better meet customer needs, while alleviating our support burden and allowing us to focus on our core constituency, the academic and non-profit research community. This plan also had the prospective benefit of leveraging the intellectual property of the GATK (much of which results more or less directly from public investments) to fund the continuation of our research and development activities.

However we knew it would take us and our partners at Appistry some time to develop a mature commercial product. So as an interim solution, we enacted a more restrictive license, closed part of the source code on the “Full GATK” release, and provided a “Lite” version to enable for-profit users to keep working with an up-to-date version of the GATK (albeit without the cutting-edge tools that were introduced in version 2.0). Of course, the GATK programming framework (the GATK engine, libraries, and basic data management tools) continued to remain open source under the MIT license.

Well, we got a lot of feedback from the user community over these changes. We listened carefully, took the criticism to heart, and realized our interim solution left much to be desired. First, closing part of the source code was a deeply unpopular move. Many of you pointed out that this might restrict academic knowledge and obstruct progress in the field of algorithmic research. Second, we did a poor job of communicating the purpose of Lite and how it differed from the Full version. Even though Lite was always intended as an interim solution, some organizations opted to adopt it instead of the Full version and seem to view it as a viable long-term solution for genetic analysis. Related to this, we found that maintaining the two different distributions gave us our share of headaches in terms of supporting and updating the toolkit.

In light of these considerations, we’re going to change things up again, hopefully for the better!

In a nutshell: no more Lite and a new license (attached) that restores free access to the source code for those in the community performing academic non-commercial research. That’s right, free as in beer! You’ll still have the option of downloading the packaged binary (i.e., the “ready-to-run” program) from our website as you did before, but you’ll also be able to get the full source code (programming framework AND all tools including the latest and greatest) straight from the Github repository if you want. You can set it up on a server and provide it as a service to other non-profit users within your organization. You can dig into our deepest secrets to find out what makes ReduceReads and the HaplotypeCaller tick. And feel free to send us patches if you find a way to improve the code!

The following figure summarizes the different packages and their corresponding licenses.

Note that if you are using a version of GATK-Lite, you may continue using it, but we will be making no more updates to Lite after 2.3. Thus, if you choose to stay with Lite, you will be using an outdated version of the toolkit and you won’t benefit from any further improvements made to the GATK with the 2.4 release and in the future.

We welcome any and all comments on these new changes, which are due to take effect with the upcoming release of version 2.4 (tentatively scheduled for early February). There’s still time to tweak the language of the license if you spot any issues we’ve overlooked.

--

Edits:

1. Removed "free as in speech" since the distribution of modified source is restricted to within academic institutions.

3. Added a TXT format version for those who don't want to use DOC.

4. Corrected the license file version.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Tagged:

• Posts: 11Member

It is great that you are returning to a full open-source model. This was always a contentious issue and I am glad you realised your actions were wrong and reversed course. However I would have to disagree when you say that companies, big and small can purchase a license for the software through APPISTRY. I am sure I cannot reveal prices here on this forum, but this is stretching to truth quite a bit since I have learned about the pricing structure of the license and it is NOT compatible with what I would define as a Small and even Medium sized company.

• Posts: 19Member
edited January 2013

Geraldine; That's great news about open-sourcing the code for HaplotypeCaller and ReduceReads. Thank you for listening to the community response to GATK 2.

Could you clarify which parts of GATK require obtaining a license for commercial use? Does this apply to all parts of the toolkit: variant calling, recalibration/realignment, and data management? Or can some functionality be used in commercial settings without a license? This distinction was clear to me with the GATK/GATK-lite toolkits. Now it sounds like the line is drawn somewhere between SelectVariants and HaplotypeCaller but I'm not sure where.

My concern is that we develop open source pipelines that could be used in both academic and commercial settings. I'd like to be able to use GATK and give a clear distinction about which parts will require a license. Currently we do this via checking for full/lite:

Will there be a way to distinguish in GATK 2.4? Thanks as always for the work on GATK.

Post edited by chapmanb on

Brad Chapman, Bioinformatics Core at Harvard School of Public Health

edited January 2013

@Pepetideo said: However I would have to disagree when you say that companies, big and small can purchase a license for the software through APPISTRY. I am sure I cannot reveal prices here on this forum, but this is stretching to truth quite a bit since I have learned about the pricing structure of the license and it is NOT compatible with what I would define as a Small and even Medium sized company.

Hi Pepetideo,
To be honest, we developers have no idea what the pricing structure is; that is all Appistry's responsibility. If it is so problematic, I recommend you talk to Appistry about your concerns. I expect if they get enough people telling them this they may look for a solution that meets your needs better.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

edited January 2013

@chapmanb said: Could you clarify which parts of GATK require obtaining a license for commercial use? Does this apply to all parts of the toolkit: variant calling, recalibration/realignment, and data management? Or can some functionality be used in commercial settings without a license? This distinction was clear to me with the GATK/GATK-lite toolkits. Now it sounds like the line is drawn somewhere between SelectVariants and HaplotypeCaller but I'm not sure where.

The programming framework, which includes the engine and infrastructure / data management, is the only thing that is completely free for all, as per the MIT license. That hasn't changed. What has changed is that previously some analysis tools (walkers) were released (as GATK-Lite) under a Broad license that made them free for all, but now we are no longer going to offer Lite. So all the analysis tools are now under the new restricted Broad license, which means they are free for academic research, but commercial use requires purchasing an Appistry license.

As to how this applies to your use of the GATK license, I believe either you or your commercial customers will need to acquire a license for that use. Note that you can continue using your existing pipeline to offer Lite tools up to v2.3 to commercial users. But this is beyond the limits of my turf so I urge you to contact either Appistry or Issi Rozen at Broad to figure out what would be the best solution for you.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

edited January 2013

@eparejatobes said: just some questions:

1. do you have anything like a contribution policy?
2. any external contributions from outside BROAD to your current codebase (that you are dual-licensing)?
3. in that case, they agree with this? did they explicitly shared/released their rights in some way?

External contributions have typically been made to the engine and infrastructure, which are entirely open and free under the MIT license. AFAIK all contributions to walkers (the tools that are covered by the restricted license) have been made by direct collaborators within the scope of Broad affiliates, which are covered by specific IP agreements.

Side-effect of the forum setup that I forgot about before posting (in case you're wondering, we ask users to register because it helps us get an idea of the size and composition of our user base, which in turn helps us allocate resources for support). I've added a link in-text. Also added a TXT format version.

you say that the commercial will include "quality assurance", and get from appistry website that one feature is "controlled releases". This implies that the "academic" variant has none of these? if not, how this could possibly be a need for the existence of a commercial version??

That's correct -- we test the code, but we don't test for every use case as thoroughly as business-level QA does. And we have a very fast development cycle, with a new release every six weeks on average, which means we release a lot of bugs, which we then fix with patches as we find them (thanks in large part with users who apply the tools and report back when they find bugs). There's a tradeoff between using bleeding-edge tools and having bug control. The academic package is research software, and we do our best to make our users aware of that. Most are fine with it or just use a slightly older version. But companies typically don't want to or can't work with that. We know there's a need for a commercial version because companies have asked us for it.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

@pjacock said: As discussed on Twitter, the described academic non-commercial licence is NOT "free as in beer AND free as in speech" (as the announcement currently states). This is not "free as in speech". This is not Open Source. This would not be OSI approved.

That's a fair point, and I've amended the text accordingly.

Free for academic use only licences are just trouble - for another thing at what point do you draw the line with things like joint projects funded by both government research councils and industrial partners being increasingly common?

That's a good question for Appistry and Issi Rozen -- I believe there is a plan for that configuration, and if you ask I'm sure you'll get definitive answers out of them.

Originally GATK was open source (MIT), and all was well. Then you introduced a closed GATK v2 but continued the open source GATK v1 as "GATK Lite". People complained publicly, many saying they would reject the new terms and continue with other open source tools - this included researchers at the Sanger Institute.

We worked with the Sanger to revise the original GATK license to satisfy their concerns, and our take aways from that conversation helped drive the new GATK license.

Now you're completely abandoning any open sourced version of GATK?!

Please keep in mind that the GATK programming framework remains completely open-source under the MIT license. Anyone is free to use it to make their own tools work on top of that. This is not a minor asset, it is quite powerful and valuable.

Also asked on Twitter: Why not follow the proven approach demonstrated by companies like RedHat - keep GATK under an open source licence, and sell commercial support as an option?

Different business, different rules? We developers are not in a position to discuss the choice of business strategy; again I would suggest taking this up with Issi Rozen.

Geraldine Van der Auwera, PhD

• Posts: 3Member

@Geraldine_VdAuwera said: Is that a bad thing?

it could be for you :) I am not a lawyer / this is not legal advice but I don't think you can easily incorporate those bug fixes/modifications into the codebase you are dual-licensing that easily, if your academic contributors are retaining copyright/IP over their contributions. You know, this would be really simple to manage in an open-source scenario :) but that's not your case.

@Geraldine_VdAuwera said: That's correct -- we test the code, but we don't test for every use case as thoroughly as business-level QA does. And we have a very fast development cycle, with a new release every six weeks on average, which means we release a lot of bugs, which we then fix with patches as we find them (thanks in large part with users who apply the tools and report back when they find bugs). There's a tradeoff between using bleeding-edge tools and having bug control. The academic package is research software, and we do our best to make our users aware of that. Most are fine with it or just use a slightly older version. But companies typically don't want to or can't work with that. We know there's a need for a commercial version because companies have asked us for it.

So, what about those companies which do research? they pay taxes, so that you get funds, and now they need to pay again, in the name of some need that you think they have?

.. "develop the cutting edge software tools to analyze those data. And we are committed to sharing all of it openly with the entire scientific community."

I really think that it needs some rewriting :)

Last, but not least, it is sad that you (a developer) are taking care of this. You've been really nice and informative, even if probably you don't personally agree with any of this (no need to answer to this of course :) ). thanks!

@eparejatobes said: it could be for you :) I am not a lawyer / this is not legal advice but I don't think you can easily incorporate those bug fixes/modifications into the codebase you are dual-licensing that easily, if your academic contributors are retaining copyright/IP over their contributions. You know, this would be really simple to manage in an open-source scenario :) but that's not your case.

Well, the point is that we get truly external contributions only to the framework, which is open-source (MIT), so it's actually not a problem. Otherwise I agree it would be more complicated.

So, what about those companies which do research? they pay taxes, so that you get funds, and now they need to pay again, in the name of some need that you think they have?

Not to go into this in detail (we had a long conversation about this on the forum last summer), but there are a number of counter-arguments to this. For example, a French company doesn't pay taxes to fund us -- so should we give the software to American companies, but not to the French? You could also say the general public pay taxes so we get funds (and individual tax rates are higher than corporate taxes), so if the fruit of their taxes goes to companies for free, why should the public then have to pay for the companies' products and services? The point here is that we add value to the original investment, those companies also add value by their activity; and our respective products (ideally) have a price point that reflects both the taxpayer's contributions and the added value.

Re: the perception of a need, we don't make this stuff up; our partners did their due diligence on market research before starting on this path.

Last, but not least, it is sad that you (a developer) are taking care of this. You've been really nice and informative, even if probably you don't personally agree with any of this (no need to answer to this of course :) ). thanks!

Ah, thanks, but don't be too sad for me -- this communication stuff is actually pretty interesting, it's part of my job and beats the pants off of postdoc limbo (my previous life) ;-)

Geraldine Van der Auwera, PhD

• Posts: 3Member
1. companies (particularly so small ones doing a lot of R&D/pure research) are made of individuals, which pay taxes too, as every other person :)
2. You can choose whether you want to pay for company X's products and services. You're forced to pay for BROAD developing GATK, in the form of taxes.
3. Not saying that this particular need (for Q&A, commercial support etc) doesn't exist; but this is completely unrelated to licensing. In fact, if there's such a need, your move is actually a negative one in that respect; In an open-source scenario, any company (not just Appistry) could provide support and a controlled release cycle, tests, whatever; and if there's really a market for that, they would do well. Instead, right now, what we have is a monopoly in terms of offering such a thing to interested companies.
4. And, again, what about companies which do pure research? they're perfectly happy with the more "bleeding-edge" release style, but now they cannot use that, just because of where they work? What's the difference between doing pure research for a salary (universities/public institutions), and doing pure research as a service (companies providing research-based services)?

best

@eparejatobes said: Re Geraldine_VdAuwera 1. companies (particularly so small ones doing a lot of R&D/pure research) are made of individuals, which pay taxes too, as every other person :)

Um, we're not asking the individual employees to pay for the software, we're asking the company to pay for it. I'm sure we can agree there is a big difference there...

1. You can choose whether you want to pay for company X's products and services. You're forced to pay for BROAD developing GATK, in the form of taxes.

You're also forced to pay for cancer research, and you may never develop cancer yourself. You're also forced to pay for public transportation subsidies, which you may never use. You're also forced to pay for aid to foreign countries. You're also forced to pay for disaster relief in another state. That's part of living in a society where everyone pitches into the common pot, you don't always choose what the money goes to, but if it's run sanely, it evens out.

1. Not saying that this particular need (for Q&A, commercial support etc) doesn't exist; but this is completely unrelated to licensing. In fact, if there's such a need, your move is actually a negative one in that respect; In an open-source scenario, any company (not just Appistry) could provide support and a controlled release cycle, tests, whatever; and if there's really a market for that, they would do well. Instead, right now, what we have is a monopoly in terms of offering such a thing to interested companies.

Keep in mind that there are other software packages that do the same thing, no one is forced to use the GATK.

1. And, again, what about companies which do pure research? they're perfectly happy with the more "bleeding-edge" release style, but now they cannot use that, just because of where they work? What's the difference between doing pure research for a salary (universities/public institutions), and doing pure research as a service (companies providing research-based services)?

It's actually not at all about where you work, it's what the software is used for. If you are doing work that does not contribute to generating revenue, you can use it without a license. But companies providing research-based services make a profit off those services. At least I hope for them they do, or they need to revise their business plan! ;-)

Geraldine Van der Auwera, PhD

• Posts: 11Member
edited January 2013

Well, in short, the broad institute has given their hard earned efforts in developing gatk to a single company that, for what I can gather, have complete control over over much they are willing to charge companies for using YOUR software!

I do hope they have been payed handsomely for this privilege.

I am predicting my company will be transitioning away from using GATK and like them most small and medium companies. We simply have no way of accepting what appistry is charging for a single user license.

What happened to the idea of offering a GATK license to companies that would apply to have a "powered by GATK" style certification? That would probably bring in a lot of money and would require very little effort on your part

Post edited by Pepetideo on

@Pepetideo said: Well, in short, the broad institute has given their hard earned efforts in developing gatk to a single company that, for what I can gather, have complete control over over much they are willing to charge companies for using YOUR software!

The terms of our agreement with Appistry were decided bilaterally to our mutual satisfaction. As a representative of the development team, I have little knowledge of the details, but our business development people have all the knowledge and control they need. For my part, I have worked closely with the Appistry people in the initial stages of this partnership and I have complete confidence in their professionalism and commitment to providing significant added value to the commercially licensed GATK.

I am predicting my company will be transitioning away from using GATK and like them most small and medium companies. We simply have no way of accepting what appistry is charging for a single user license.

That is your prerogative and I wish you success in your undertakings.

What happened to the idea of offering a GATK license to companies that would apply to have a "powered by GATK" style certification? That would probably bring in a lot of money and would require very little effort on your part

That is still a possibility; but if we do it we'll be doing it to protect the public from bad implementations of our software, because we know there are some out there. And if/when we do it, it won't be a "minimal effort" undertaking. Certifying implementation of complex software like the GATK is not trivial, if it is to mean anything.

Geraldine Van der Auwera, PhD

• Posts: 11Member

@Geraldine_VdAuwera said: And if/when we do it, it won't be a "minimal effort" undertaking

I did not mean it this a derrogative way. Apologies if it sounded that way. I was just referring that it would be rather straightforward to create a mechanism to certify some analysis pipelines to be compliant with you best practices in regards to GATK software.

• Posts: 85Member ✭✭✭

I have a question of practical nature in regards to paragraph 2.2 of the new licence. The licence states:

LICENSEE shall not sublicense or distribute the PROGRAM, in whole or in part, without prior written permission from BROAD.

I'm wondering how this relates to my (and the 40-or-so other) fork of GATK on github. Will the source released on github be restricted to the framework, or will it also contain the analysis tools? Does this mean that I will need written permission from Broad to host a publicly available fork or am I missing something here?

@Johan_Dahlberg said: I have a question of practical nature in regards to paragraph 2.2 of the new licence. The licence states:

I'm wondering how this relates to my (and the 40-or-so other) fork of GATK on github. Will the source released on github be restricted to the framework, or will it also contain the analysis tools? Does this mean that I will need written permission from Broad to host a publicly available fork or am I missing something here?

Hi Johan,

Re: access, the entire source will remain in github, so as long as you're using it for non-profit academic research you don't need to change anything. Not sure about the steps for having forks be public on github -- I will look into it and let you know.

Geraldine Van der Auwera, PhD

• Posts: 19Member

That was bad news for me. I'm in between the commercial and academic world and this change in policy kind of blocks our academic research as it goes accompanied and financed by commercial interests.

About keeping to the last free (as in beer) version GATKLite 2.3, will you keep support in your forums? And if so, for how long?

@priesgo said: That was bad news for me. I'm in between the commercial and academic world and this change in policy kind of blocks our academic research as it goes accompanied and financed by commercial interests.

I strongly recommend you talk to Appistry about your case. You are most certainly not the only one in this situation and I'm sure they can help you find out exactly how the restrictions affect your activities. They respond within 24 hrs to all queries made using the online contact form I linked to above.

About keeping to the last free (as in beer) version GATKLite 2.3, will you keep support in your forums? And if so, for how long?

Unfortunately we won't be able to provide support for 2.3 once we move on to 2.4, as per the policy we've always had on the forum. You can always ask usage questions about the parts that don't change, and I'm preparing a versioned documentation system so it will be possible to refer to older docs. But if you run into bugs that are not in 2.4, I'm afraid you'll be on your own.

Geraldine Van der Auwera, PhD

• Posts: 85Member ✭✭✭

@Geraldine_VdAuwera said: Not sure about the steps for having forks be public on github -- I will look into it and let you know.

As far as I'm aware there is no way to disable forking for a public github repository, so if free creation of public forks is not an option, I'm not sure that keeping the code on github will be an option. Personally, I think that it would be a step backwards to remove the option of accessing the code that way as it would make it more difficult for those of us who are (in a very modest way) contributing to the code base.

@Johan_Dahlberg said: As far as I'm aware there is no way to disable forking for a public github repository, so if free creation of public forks is not an option, I'm not sure that keeping the code on github will be an option. Personally, I think that it would be a step backwards to remove the option of accessing the code that way as it would make it more difficult for those of us who are (in a very modest way) contributing to the code base.

I agree. Rest assured that we appreciate your contributions to the framework and we remain fully committed to keeping the source on Github.

Geraldine Van der Auwera, PhD

A couple of follow-up points:

@priesgo, after discussion of your case I was told that the license specifically mentions the case of academic labs that get funding from the private sector; that is not considered a commercial activity. I would still recommend you check with Appistry, but based on this it is extremely likely that you can use GATK for free under the Broad license to academics.

@Johan_Dahlberg, I've discussed your questions with the group and I can confirm that we will continue providing the source code through Github indefinitely. In addition, we have no problem with your having a forked version that is publicly accessible on Github as long as it is clearly identified as a fork of the GATK and retains the license text. We will put together a text that explains exactly what are the requirements and restrictions (mainly to keep our legal people happy) but basically you don't need to change anything.

Geraldine Van der Auwera, PhD

• Posts: 4Member

I have to agree with @pjacock, this is a terrible decision and a terrible move by Broad.

Your role in the world is not to make money, your role is to produce fantastic data and fantastic software that benefits the wider good of the public.

I appreciate your comment that perhaps supporting commercial enterprise was getting expensive, so why not just charge for the support and leave everything else as true open source?

A lot of things seem to have changed in the attitude of Broad recently - is this due to funding changes imposed by your funders? Are you being charged to become more sustainable? I tell you, charging for GATK licenses is not the way to do it.

Take my use-case - I do work for academics, I do work for industry. So I need a single pipeline for variant analysis. If that single pipeline is to use GATK, then I would need some sort of license - so forget it. I will use samtools and the plethora of other genuinely open source tools around.

I don't think you guys have any idea how much damage these decisions are having on your reputation in the bioinformatics community.

• Posts: 4Member

Just read the post above mine to see my "use case" may well still fall under academic work. i) I would be very worried I would adopt this and then you change the license and I am screwed, and ii) I am still morally outraged that you're doing it at all

• Posts: 1Member

I think Mick's comment is especially true, the Broad seem to be changing their mind on things with every release, staggering from one idea to the next. Who is to say that in 6 months this isn't going to change again? Especially if Appistry fail to deliver on licence sales..

While I certainly appreciate your concerns about open source, ownership of the project, and stability of our decisions, I don't think either of you will be satisfied with anything less than free (not just open) source software. We are of course happy if you'd like to use another software package, because of moral open/free software requirements or for the need for long-term project stability. Certainly if you want a purely open sourced data processing tool, you won't find it in the GATK. You are more than welcome to go out and create your own, or to use another already existing package; nothing is stopping you.

Our view is that we need a commercial version of our software in order to support the many current and upcoming users who need long-term, commercial-grade software. We are under no obligations from our funders, or from anyone at the Broad, to commercialize the GATK. We are simply responding to a need for a more stable, better supported version of the GATK, and our view is that partnering with Appistry is the best way to accomplish this goal.

-- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

I sympathize with your anxiety about future changes. But please understand that this has been a transitional period for us too, and we've had to scale up from being a small team producing software internally for a fairly modest number of researchers within our institute, to serving thousands of researchers worldwide (but still as a small team!). We made some mistakes in setting the original course six months ago, and now we're trying to make this work better. Would you really prefer us to stick with a flawed model even though we know we can serve people better by adjusting it, just in the name of not changing anything?

We honestly don't want to have to make any more changes to our model after this, which is why this time we're asking for feedback before going through with it. If there's any aspect we can tweak to make it work better for people, we'll seriously consider it, even if we can't accommodate every request (a complete change of model is simply not going to happen).

Geraldine Van der Auwera, PhD

• Posts: 4Member

@Mark_DePristo Thanks for the response. I still don't really see why, if support is the issue, why you cannot adopt the model of charging for support (a la MySQL, Red Hat etc).

I think time will tell if this is a smart move, however, my opinion (after 15 years in bioinformatics, in industry and academia) is that it is not.

@Geraldine_VdAuwera I suspect samtools, bwa etc also support 1000s of users worldwide, and operate as free, open-source tools, supported by a small team.

I just do not like this move. Bioinformatics, since it began, has been driven by free, open-source software. I will not support any move to change that, to be honest.

• Posts: 19Member

@priesgo, after discussion of your case I was told that the license specifically mentions the case of academic labs that get funding from the private sector; that is not considered a commercial activity. I would still recommend you check with Appistry, but based on this it is extremely likely that you can use GATK for free under the Broad license to academics.

I appreciate your response @Geraldine_VdAuwera but my case is not exactly academic work privately financed, it is more like the use case presented by @biomickwatson. If it can't be used later in a commercial setting, the partnership ends.

In any case I respect your decision.

• Posts: 19Member
edited January 2013

Mark and Geraldine;

This is a useful discussion, thanks for all the thoughts about your decision making process. If you're still open to modifications to the model one practical suggestion is to make only a subset of your high value walkers fall under the commercial/academic license, while the more framework-based walkers continue to be MIT licensed. The value of using GATK as a platform diminishes if commercial users require a license to run any of the GATK walkers. As a simple example, all of my open-source custom annotators now require a license to run in commercial settings.

I've been a user of the GATK since early 1.0 versions so have significant expertise and infrastructure built up around the tools and API, in addition to making use of variant calling capabilities. The goal I'll hope to reach with new GATK versions is to offer commercials users of my pipelines a choice to swap into GATK functionality as needed. As currently structured, this is difficult and requires either maintaining two concurrent pipelines or removing most of the GATK infrastructure walkers. For a lot of functionality like annotating, selecting, and manipulating variants there are equivalent tools or roll-your-own options but I've chosen to use GATK because of familiarity and trust.

I know you don't owe your users anything, but it would make the transition less painful for those of us with a more open-source direction to consider increasing the proportion of the toolkit that is MIT licensed. This is not antithetical to your stated goals since the GATK-specific tools would still be dual licensed. Those of us who had thought of GATK as a community developed resource and helped with patches, documentation and question answering feel a bit blindsided by the multiple changes in direction. For me, keeping more of GATK as MIT licensed would help with the transition.

Thanks again for listening to feedback.

Post edited by chapmanb on

Brad Chapman, Bioinformatics Core at Harvard School of Public Health

@chapmanb said: it would make the transition less painful for those of us with a more open-source direction to consider increasing the proportion of the toolkit that is MIT licensed. This is not antithetical to your stated goals since the GATK-specific tools would still be dual licensed.

Brad, you make a very good point here. I'm happy to report that after discussing this internally, we've decided to include the utility-type walkers in the MIT-licensed framework. Which means all walkers except the following core GATK-specific tools, which will remain dual-licensed (Broad/Appistry):

• BaseRecalibrator
• RealignerTargetCreator
• IndelRealinger
• UnifiedGenotyper
• HaplotypeCaller
• VariantRecalibrator
• ApplyRecalibration
• VariantAnnotator annotation modules (not the walker itself)

Will this help solve your problem?

Geraldine Van der Auwera, PhD

• Posts: 5Member

@Geraldine_VdAuwera said: That's correct -- we test the code, but we don't test for every use case as thoroughly as business-level QA does. And we have a very fast development cycle, with a new release every six weeks on average, which means we release a lot of bugs, which we then fix with patches as we find them (thanks in large part with users who apply the tools and report back when they find bugs). There's a tradeoff between using bleeding-edge tools and having bug control. The academic package is research software, and we do our best to make our users aware of that. Most are fine with it or just use a slightly older version. But companies typically don't want to or can't work with that. We know there's a need for a commercial version because companies have asked us for it.

I'm not sure if Appistry actually does any additional QA. I had a few phone conference calls with them and asked them specifically what is your QA process and they seemed clueless about this and just responded with general terms like "we make sure that the package works".

edited January 2013

@sarmadym said: I'm not sure if Appistry actually does any additional QA. I had a few phone conference calls with them and asked them specifically what is your QA process and they seemed clueless about this and just responded with general terms like "we make sure that the package works".

Well, I've met the person in charge of QA and I can tell you she knows her stuff. And she reads every post on this forum, so I expect she's going to go over and do a bit of QA on the sales people...

EDITED: QA of course, not Q&A (I see Q&A everywhere, sorry)

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

• Posts: 4Member

I'm finding this thread hard to follow because the original post keeps being update without any notes to say what has changed. Just now I noticed even the figure has been edited - I'm sure the framework licence used to say "free - restricted", but now it says "MIT (free - open)".

That is actually a very good change, as it appears the Broad is now continuing to provide the GATK core framework as open source software under the MIT licence. If that is true, then I am glad you're listening to the strong negative backlash against the original text where it appeared none of GATK would continue under the MIT licence.

@pjacock said: I'm finding this thread hard to follow because the original post keeps being update without any notes to say what has changed. Just now I noticed even the figure has been edited - I'm sure the framework licence used to say "free - restricted", but now it says "MIT (free - open)".

I'm sorry but that is incorrect. The original post was updated only once and all the edits done at that point are listed at the bottom of the text. You can further see the details of the changes in the change log. You'll see this lists three change "events" made immediately consecutive to each other.

As for the figure, it has not been changed at all; the label for the framework has always read "MIT (free - open)". I can show you the original working documents if you'd like.

That is actually a very good change, as it appears the Broad is now continuing to provide the GATK core framework as open source software under the MIT licence. If that is true, then I am glad you're listening to the strong negative backlash against the original text where it appeared none of GATK would continue under the MIT licence.

That is indeed the case, the core framework will continue to be under MIT license, as stated in the original text (and figure) of this announcement.

The latest development, which is not as such reflected in the original text, is that we have decided to transfer "utility" walkers from the "protected" category into the framework in order to make them freely available. This we did in response to requests such as Brad Chapman's (above); and that is indeed the point of this entire exercise -- to collect feedback and amend our plan to suit a majority of users, before it goes into action.

Geraldine Van der Auwera, PhD

• Posts: 4Member

@Geraldine_VdAuwera said: I'm sorry but that is incorrect. The original post was updated only once and all the edits done at that point are listed at the bottom of the text. ... As for the figure, it has not been changed at all; the label for the framework has always read "MIT (free - open)". I can show you the original working documents if you'd like

My apologies - I accept I was in error. I regret staying up late and commenting while tired.

@Geraldine_VdAuwera said: You can further see the details of the changes in the change log. You'll see this lists three change "events" made immediately consecutive to each other.

Actually I can't - the link http://gatkforums.broadinstitute.org/log/record/discussion/2091 says "Permission Problem - You don't have permission to do that.". Was there any link to the change log one the page prior to your comment?

I think having an automated change logs or revision history for blog posts (once first published) would be a really good idea - something I think most readers would find familiar from their use in wikis.

• Posts: 19Member

Geraldine;

That is brilliant. Thank you for listening, discussing with the group, and for the changes. It's incredibly helpful for me, and I hope will also help with continued open source use and adoption of the framework.

The metrics and open-source geek part of me can't help but ask: why did you decide to include the annotations as part of the dual licensed? The annotator always struck me as one of the most useful place for community development, and I've made good use of annotators other folks wrote. Deriving from existing annotators you wrote is by far the easiest way to get started (PerReadAlleleLikelihoodMap and all, you know) which might become more complicated under dual licensing (I am not a lawyer). The decision on all the walkers makes good sense to me.

Thank you again for listening. It's much appreciated.

Brad Chapman, Bioinformatics Core at Harvard School of Public Health

@pjacock said: Actually I can't - the link http://gatkforums.broadinstitute.org/log/record/discussion/2091 says "Permission Problem - You don't have permission to do that.". Was there any link to the change log one the page prior to your comment? I think having an automated change logs or revision history for blog posts (once first published) would be a really good idea - something I think most readers would find familiar from their use in wikis.

Oh, I guess you have to have admin privileges to see the log. I took for granted that everyone could see it, sorry. I'll see if it's possible to make the logs publicly accessible.

Geraldine Van der Auwera, PhD

@chapmanb said: That is brilliant. Thank you for listening, discussing with the group, and for the changes. It's incredibly helpful for me, and I hope will also help with continued open source use and adoption of the framework.

Thanks, that is our hope as well.

The metrics and open-source geek part of me can't help but ask: why did you decide to include the annotations as part of the dual licensed? The annotator always struck me as one of the most useful place for community development, and I've made good use of annotators other folks wrote. Deriving from existing annotators you wrote is by far the easiest way to get started (PerReadAlleleLikelihoodMap and all, you know) which might become more complicated under dual licensing (I am not a lawyer). The decision on all the walkers makes good sense to me.

The reasoning is that some of the annotators are key to certain core tools, so it makes sense for us to keep them under the same level of protection. But you make a good point about it being easier to derive new annotators from existing ones. I will see if we can release some of the annotators as examples for development at least.

I do have a correction to make; in the list earlier I unintentionally omitted three walkers that will stay "protected" too: PhaseByTransmission, ReadBackedPhasing and DiagnoseTargets. My apologies; I hope this won't cause you any problems.

Geraldine Van der Auwera, PhD

• Posts: 19Member

Geraldine;

The annotations are also useful in other contexts for evaluating variants. I used them for developing methods for filtering variant calls from multiple callers and I'm sure there are other cool reuse ideas that are parallel to your key pipelines. I think this remixing is a critical component of improving bioinformatics tools. With the downstream tools protected, you can commercialize the pipelines you developed, while allowing others to experiment around the edges.

I appreciate your willingness to discuss. I use GATK both with the best practice pipelines and to develop new tools, and it's great that you're open to allowing the continued experimentation and reuse of the framework. Thanks again.

Brad Chapman, Bioinformatics Core at Harvard School of Public Health

• Posts: 4Member

There isn't enough room here, so I have put my thoughts on my blog:

http://biomickwatson.wordpress.com/2013/01/28/gatk-why-it-matters/

I hope some of what I have written makes you think a little :)

• Sanger, Cambridge, UKPosts: 70Member
edited January 2013

On the whole being able to read the source code is better than not and the more objectionable portions of the license are gone so this makes it easier for me. It's not ideal but at least we can see what processes are being performed on our data.

The one area I am slightly worried about however, is the situation with the Appistry and the source code. Can you confirm the Appistry GATK will be built from the same source code as the Broad GATK? This is obviously important because if I quote GATK version xxx.yyy.zzz against my academic copy, and someone reads that in a paper and tries to reproduce it on their commercial GATK it would be somewhat worrying if because of some slight bugfix made by Appistry the results delivered by their copy were different.

Incidentally whilst I'm not sure I see why it would be in Appistry's interests to restrict their users from seeing the code behind their version if that code is the same as the Broad version (and I realise this is may not your decision). I could understand them wanting to charge for certifying any of their users customisations to the code so that they can be supported though.

Post edited by TechnicalVault on

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

• Posts: 4Member

@biomickwatson said: There isn't enough room here, so I have put my thoughts on my blog:

http://biomickwatson.wordpress.com/2013/01/28/gatk-why-it-matters/

I hope some of what I have written makes you think a little :)

@TechnicalVault said: The one area I am slightly worried about however, is the situation with the Appistry and the source code. Can you confirm the Appistry GATK will be built from the same source code as the Broad GATK? This is obviously important because if I quote GATK version xxx.yyy.zzz against my academic copy, and someone reads that in a paper and tries to reproduce it on their commercial GATK it would be somewhat worrying if because of some slight bugfix made by Appistry the results delivered by their copy were different.

Actually, the Appistry GATK package is built from Appistry's fork of our repository; it follows ours very closely but here can be any number of changes (typically fairly minor) to the source code. The Appistry versions are numbered separately from ours, although the naming conventions do contain information that make it possible to identify which academic version it is in step with. To address your concern, Appistry will provide the necessary documentation and support to licensed users to resolve any differences that occur between the academic and the commercial versions. For further details on this I would advise contacting Appistry directly; they will be better able to explain their processes.

Incidentally whilst I'm not sure I see why it would be in Appistry's interests to restrict their users from seeing the code behind their version if that code is the same as the Broad version (and I realise this is may not your decision). I could understand them wanting to charge for certifying any of their users customisations to the code so that they can be supported though.

This is indeed intended to mitigate the effects of users making modifications to the code. Since Appistry provides direct customer support, having users modify the code can potentially greatly increase their support burden, so they need to have some control over this. But I don't expect they would have any problem with their users looking at the source code to understand how the tool algorithms work, for example. Again, if you're interested in knowing more about how Appistry handles this, don't hesitate to contact them directly.

Geraldine Van der Auwera, PhD

• Posts: 1Member

I'd like to create a Homebrew formula for GATK, but the click-through license makes it impossible, since there doesn't seem to be a direct URL from which the source can be downloaded. Is there a direct URL?

We have no plans to provide a direct URL, but when we release version 2.4 (this week or next) it will be possible to clone the full source code repository, which you can update directly via Git. If you're comfortable compiling from source that will be the most convenient way to keep your local install up to date.

Geraldine Van der Auwera, PhD

• Sanger, Cambridge, UKPosts: 70Member
edited January 2013

@Geraldine_VdAuwera said: To address your concern, Appistry will provide the necessary documentation and support to licensed users to resolve any differences that occur between the academic and the commercial versions.

If you could see fit to persuade them to commit to providing this information publicly that would be helpful; gain goodwill; lessen support burdens and cost them nothing.

The scenario I envisage is a non-technical commercial user who is unaffiliated to myself trying to reproduce the results of my analysis which I produced using the academic version and published in a paper. There is an inexplicable difference in the results and they write to me to complain my results aren't reproducible. Because I am not a licensee I would not have access to the documentation so I wouldn't be aware of any differences that may or may not exist, nor can I reproduce their environment.

Post edited by TechnicalVault on

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

@TechnicalVault said: The scenario I envisage is a non-technical commercial user who is unaffiliated to myself trying to reproduce the results of my analysis which I produced using the academic version and published in a paper. There is an inexplicable difference in the results and they write to me to complain my results aren't reproducible. Because I am not a licensee I would not have access to the documentation so I wouldn't be aware of any differences that may or may not exist, nor can I reproduce their environment.

OK, I'll let Appistry know that this is a concern. But based on my interactions with many users including commercial ones I would expect that the commercial user would first go to Appistry's support and ask them why they can't replicate your results (and ask about version differences) before coming to you. When people aren't getting the expected results, they tend to first assume that the software isn't working properly, rather than question the expectations. I'm not sure what this means for science as such but from the point of view of your use case it should help fend off complaints...

Geraldine Van der Auwera, PhD

• Posts: 3Member

I am very happy. I have developed an external package PooledWalker (for analysis of Pooled DNA projects) to an older version of GATK (which requires compiling the whole GATK), so closing the source meant I could no longer publish it. I'll move it to the 2.3+ framework and publish it now.

Glad to hear it! FYI you might want to wait another week until we release 2.4 to start fwd-porting, because we've moved some things around in the framework.

Geraldine Van der Auwera, PhD

• Posts: 15Member

We are one of the early commercial access customers of GATK v2 and I must say we are not happy with the whole arrangement... First of all, we only have access to the very old version (2.2 somethings) and this version is riddled with bugs...We have to wait until the END OF MARCH to get an updated version (which I am sure is just as old as the one we are using now) and in the mean time I cannot use any of the tools since things like the HaplotypeCaller (which is really the only reason we would even consider taking out a license) is buggy as are many of the other tools in version 2.2.

It also seems that the only thing Appistry is doing with many of my bug reports is sending them off to the Broad, basically just inserting them as a middle man, just delaying the whole process for us...I have a very hard time convincing my management that taking out a license for GATK is worth it, since there are plenty of alternatives for us for many of the tools...

I can't see that Appistry did ANY QA since some of the basic tools were broken in the version they want us to fork over $100,000 for.... I would suggest you follow the RedHat model, if you feel that the burden of support is too much and have us commercial customers pay for support rather than restrict us access.... Thanks Thon Genomic Health, Inc. • Posts: 6,398Administrator, GATK Developer admin Hi Thon, I understand your concerns, however please keep in mind the following: First of all, we only have access to the very old version (2.2 somethings) and this version is riddled with bugs... • The current academic version is 2.3, so version 2.2 is hardly a "very old version". We do have many users still using versions older than 2.3 without too much grief. Note that many bugs in the newer tools are caused by edge cases which can be indicative of issues with data quality. The solutions we implement to deal with these are often to handle them explicitly in a way that prevents the GATK from blowing up (e.g. pass over the bad data), but they don't solve the underlying data issues. If you find yourself encountering many such issues, you may need to check your data quality more thoroughly before feeding it to GATK. • If it's any comfort, version 2.3 is buggy too. Every version we release tends to be buggy in new and interesting ways, because with each version we introduce new features that are in some way or another susceptible to edge cases. We can't conceivably test for every eventuality, so when we release new features into the wild, it's like we're crowd-sourcing the testing process (sorry, folks). That is the nature of cutting-edge research software -- it cuts both ways. things like the HaplotypeCaller (which is really the only reason we would even consider taking out a license) is buggy as are many of the other tools in version 2.2. • We appreciate your interest in the HaplotypeCaller and we are working hard to speed along its maturation from a bleeding-edge research tool into a stable production-ready weapon of mass discovery. I sincerely believe your pain as an early adopter will be rewarded as your experience with its functionalities will put you several lengths ahead of competitors who adopt it later in the game. We have to wait until the END OF MARCH to get an updated version (which I am sure is just as old as the one we are using now) • The commercial version slated to be released next month will be version 2.4, and will be entirely up-to-date relative to the academic version (we are still working on it so 2.5 will definitely not be out before mid- to end- April at the very earliest). It also seems that the only thing Appistry is doing with many of my bug reports is sending them off to the Broad, basically just inserting them as a middle man, just delaying the whole process for us... • We are the developers of the GATK, not Appistry. Their role regarding support of commercial customers is to field issues and either (1) respond directly if the issue is a matter of usage and/or documentation, or (2) refer to us if the issue is a genuinely new bug report. They have no business fixing bugs in the codebase; we own that responsibility entirely. You may think that this makes them a useless middleman; but I (as sole Broad-side "tech support agent") can tell you that they provide an invaluable service by filtering out at least 80% of support requests from commercial users. Put another way: the overwhelming majority of support requests we (and they) get are not true/new bug reports, they are questions of usage and/or documentation. Appistry handles those on their own for commercial customers. But if they get a real bug report, they have to send it to us for fixing. This setup doesn't delay the support you get; in fact it means you get faster support for non-true/new-bug issues because you don't have to wait for me to finish dealing with the twenty other user questions I woke up to that morning. I can't see that Appistry did ANY QA since some of the basic tools were broken in the version they want us to fork over$100,000 for....

• Similarly, "doing QA" doesn't mean fixing bugs, it means making sure that the product meets certain specific quality requirements. That includes catching bugs and many types of documentation errors (here's looking at you, Nicky), but when those are found, for the most part Appistry aren't supposed to fix them, they're supposed to send them along to us for fixing. If there's been any perceived delays in fixing bugs, it's entirely our responsibility.

I have a very hard time convincing my management that taking out a license for GATK is worth it, since there are plenty of alternatives for us for many of the tools...

• We don't wish to force anyone into using our tools if they feel there are alternative tools that perform to the same high standards of performance and accuracy.

I would suggest you follow the RedHat model, if you feel that the burden of support is too much

• I have explained elsewhere (see my response to point 6) why we don't consider the RedHat model of support to be a viable option.

Finally, please continue to address any and all support requests to Appistry. Craig enjoys the attention :)

Geraldine Van der Auwera, PhD

• Posts: 15Member

Thanks for the comments...I do know that the Broad is the one responsible for fixing the bugs etc. and it is good to hear that with the next release we will be up to par with the academic version, but I fear that once some bugs are detected in 2.4 (And let's face it, it sounds like quite a re-write of the framework, so there are bound to be some, pretty devastating ones in there), we commercial users have to wait a complete quarterly release cycle to get the bug fixes...

In any case, since we are still on version 2.3-9, where is the LITE version? i can't see it on the main download page and it states that from version 2.4 LITE will not be available, but we ain't there yet! I have some tools i use sometimes from the LITE version that were broken in my commercial version (SelectVariants manges up the AD fields in some occasions) I would like to see if that is addressed in the latest version...

Thanks,

Thon

Thon, you wound me... devastating bugs, really? I'll admit most of our releases come with a flurry of annoying little bugs and mishandled edge cases, but I don't believe we've ever let something you could qualify as devastating slip into a public release. The hyperbole is a little unnecessary, surely. In any case you'll be happy to know that the modifications we're currently making to the framework and toolkit ahead of 2.4 is not so much re-writing things as deploying an epic number of systematic tests, so release 2.4 should be our least buggy ever, to date. Personally I'm working on making significant changes to improve the Tech Docs as a direct result of Appistry's QA process, as well.

Re: LITE version, that is in fact completely unintentional -- the updated "post-2.4" website changes were released early by mistake, so thanks for bringing that to my attention. It would be difficult to revert that now but I'll put up a link to the latest version archive. In the meantime you can follow this link to download the source archive and compile locally (ant dist). That is still all Lite, ie free to use for all, until we release 2.4.

Geraldine Van der Auwera, PhD

• Posts: 1Appistry Employee
edited February 2013

Hello,

My name is Trevor Heritage, and I am VP of sales and marketing at Appistry. I worked closely with the Broad Institute throughout the decision process around commercialization of the GATK, including the rationale for that decision and the details of the licensing arrangements. All of us on Appistry's GATK team have been monitoring this thread, but until now we have refrained from responding to the various remarks about our company, our business policies, and our service because this is the Broad's forum for their user community, and comments regarding Appistry should, frankly, be directed to Appistry.

I particularly want to hear about your evaluation experiences, as the whole point of an evaluation is to collect feedback that we can use ourselves or communicate to the Broad to improve the toolkit as well as our commercial offer and services.  Remember, if you have general licensing or price related questions, then your best course of action is to talk directly to Appistry. We’re not some faceless corporation; we're good people, we're approachable and we're entirely willing to talk about any misgivings or issues you have.

Appistry was chosen by the Broad to offer GATK users high quality commercial support; this is exactly what we offer today and have been offering since the decision to commercialize GATK was made.  So, let's leave the Broad team to do what they do best -- developing a phenomenal piece of software that improves genetic research and enables scientific insights that were not previously possible. We’ll take responsibility for determining, with your help, how we can best serve commercial users of the GATK.

I will now let Geraldine continue to run her forum. I hope that more of you will reach out to us/me directly so that Geraldine can spend more of her valuable time providing support to users with scientific issues.

Post edited by Trevor_Heritage on
• Posts: 15Member

Geraldine, I am sorry if I wounded you with my comment about devastating bugs...I did not intend that! I was merely speculating that it was a possibility as I suspected that a new framework was going to be a re-write...It is great to hear that 2.4 is more about implementing a large number of tests which will surely reduce the number of bugs I'm sure! I feel very hopeful for the new version...It's just that we commercial users have not been able to try any of the new functionality since 2.2-16 was the version we got and HaplotypeCaller was just not producing correct results, so we never got to determine if taking out a commercial license was of value to us... Trevor, I've send you an email to setup a conference call to discuss directly, but since Geraldine asked for public comments, I will continue to make my comments public :)

• Posts: 15Member

Oh Thon, again with the drama? You're not banned from using the forum; it just so happens that I know you're a commercial user and are (or were) an Appistry customer because you've been a little bit... vocal about it. I have no insight into how long evaluation terms last, so I assumed yours was still running. But hey, if you're all done with Appistry and set on going forward with Lite, you're welcome to ask me questions again. I've got nothing against you, I was just enjoying being able to punt some of my workload to Craig :)

Geraldine Van der Auwera, PhD

• Posts: 15Member

Yeah, we dutch guys are known for our drama :)...But all kidding aside, I just wanted to see clarified what the procedure is, since it is important for us to know where we stand with GATK...It's a significant part of my work at the moment but I am getting a lot of pressure from management to show it is worth all the drama :) So far it is!

Hah, geen probleem, makker -- I'm Belgian. We're pretty chill.

I'm happy to clarify any lingering questions -- or refer you to the appropriate person when I don't know the answer.

Geraldine Van der Auwera, PhD

• Posts: 10Member

Hi,

Would you please point out the location for the last available "Lite" version? It seems with the 2.4 release, the lite version is not available (even older versions.) Thank you!

Sorry @JTeer, since we no longer support older versions of GATK, we also don't provide the corresponding binaries. Perhaps you can ask colleagues if they have a copy they can share with you.

Geraldine Van der Auwera, PhD

Addendum: you could also download the source from the Github repository, check out the last Lite version and build from source, if you are comfortable doing that.

Geraldine Van der Auwera, PhD

• Posts: 10Member

Thanks @Geraldine_VdAuwera. The tree is a bit complex (particularly with the license changes) so I want to confirm that the following location, using the "2.3" tag is the last lite version: gatk-protected->public This seems to contain the UnifiedGenotyper (where gatk-protected->public 2.4 does not), and does not contain the HaplotypeCaller, so I think that's correct, but I don't see the Lite license, and need to confirm this is the Lite version. Thanks!