Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

Discussion of the new licensing and mixed open source model

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,407Administrator, GATK Developer admin
edited January 2013 in Ask the GATK team
This discussion was created from comments split from: GATK 2.0 announcement.

This was done because the new licensing and mixed open source model has turned out to be the object of much debate. We want to encourage discussion on this topic without obscuring the GATK 2.0 announcement thread, which is dedicated more so to the GATK 2.0 software itself, particularly the new tools and improvements.

So, have at it!
Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • wileygwileyg Posts: 1Member
    Are you going to encrypt or otherwise watermark my vcf's such that I am unable to send them to my many non-institutional collaborators?
  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin
    No we are actually revising the language to be clear that we are happy with individuals redistributing results, so long as they are from academic non commercial entity.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • jtaljtal Posts: 1Member
    Just to clarify- Is it ok to redistribute results as long as you are sharing with someone from AN academic non commercial entity- or as long as they are from the same academic non commercial entity where it was produced?
  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin
    I've just updated the license and FAQ language. If you are part of an academic non-commercial entity it's fine to run GATK2.0 and distribute results to anyone. This is true even if you are running it as part of sponsored research, which is technically "commercial". Note that GATK-lite, if you aren't in an academic non-commercial entity, or don't like the license, only contains tools licensed under the MIT license and you are completely free to do with it whatever you see fit. Does that help?

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • PepetideoPepetideo Posts: 11Member
    Pretty sad move for GATK. When even big companies are starting to see the benefits of Open source, GATK goes exactly in the opposite direction. Can anyone tell me how this makes any sense? If I as a commercial organization decide that I will buy a licence, I cannot run my clients data and give them the results of the analysis.

    That is almost like buying an Office licence and not being able to share excel files!
  • PepetideoPepetideo Posts: 11Member
    Just one more thing... I was just browsing the Broad Institute webpage and this paragraph made me laugh:

    http://www.broadinstitute.org/what-is-broad/broad-approach

    "Share openly. Seizing scientific opportunities requires creating methods, tools and massive data sets — and making them available to the entire scientific community to rapidly accelerate biomedical advancement."

    Unless you do not accept that commercial entities can also be part of the scientific community this move to make the GATK closed source is completely at odds with your mission statement!
  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Hi Pepetideo,

    You can share your GATK results -- that was a language slip up on our part. You can see it's clear in the license and FAQ now.

    As for why, its two fold. One is to ensure that we can continue to develop and support the GATK into the future by creating a sustainable revenue source for the team. Two is that a commercial version will be able to support a large team providing tier-one support, such as long-term maintenance of specific GATK versions, which my research group simply cannot provide. Note that any commercial entity who wants to stay with GATK-Lite can go the full open source route, at the cost of foregoing premium support and access to the best possible tools.

    We recognize that this is a change, and of course we are big supporters of open source software -- the vast majority of the GATK2 is open source. We considered creating a "GATK foundation" mozilla style, accepting micro donations, or even providing pay-for-services on top of the GATK but ultimately the commercial/non-commercial divide seemed the option that provides the most value to the entirely community.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • PepetideoPepetideo Posts: 11Member

    So to be clear... If I am a commercial entity and I purchase a license for GATK2 and built a sequencing analysis pipeline using it. Can I then give VCF files of the results of my analysis and supporting BAM files to my clients or not? I am sorry for insisting on this but I read through the FAQ and could not find an strait-forward answer to this and I believe this is very important to get right.

  • PepetideoPepetideo Posts: 11Member
    edited July 2012

    I am really sorry you decided to go the closed source route... I am sure the option to allow micro-payments would no doubt be a better one. You could even create a special award (recognition stamp) to be given to entities that supported your efforts. I am sure many commercial entities would pay to be given that acknowledgement (and additional tier-one support).

    Something like to "intel inside" (R) logo.

    Companies like to show to their clients that they are supporting research and development efforts so well regarded and so widely used like GATK so being able to boast about a genomic pipeline being "powered by GATK" would probably give more than enough revenue to keep it opensource.

    Post edited by Pepetideo on
  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Hi Peptideo,

    For comment (1), certainly this is something we are actively interested in enabling but it will depend on the details of the commercial license terms, which we are actively refining. At this time I would differentiate between two separate activities.

    • Purchasing a commercial GATK2 license will give you the right to run the GATK2 within the company and share / publish / etc your results. This is what I'd think of as a standard commercial license, and most places would fit in this bucket. The example here is buying Adobe Photoshop and using it in house to manage and edit photos.
    • The more complex question is around third-party pipeline executors, which only take in data from others and who effectively sell the running of the GATK. Here I think there will be a separate license with specific terms, but it's something we'd like to enable. The analogy here is setting up a for-profit web portal for photo editing that backends to photoshop. A valuable activity but one not covered by the standard end-user license agreement.

    As for comment (2), this is an excellent idea. I think we are likely to offer a workshop and validation service (all details need to be worked out) that offers a "gatk inside" logo to commercial entities that attend the workshop and pass some kind of best practice exam or other similar inspection / validation / etc process. How this all would work out isn't yet clear but we'd like to make sure there are resources available to places to feel confident that they are using the GATK in the best possible way for their data and process, and to receive some kind of certification that vouches for this.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • PepetideoPepetideo Posts: 11Member

    Good to know!

    I am glad that it was just a badly worded paragraph in the licensing terms. I completely understand the limitation/restrictions you want to impose on commercial third-party pipeline executors.

    I still hoped you can find a way to support your work by still keeping it open source since that is a much more dynamic development environment.

    I am sure paying customers will demand much more of you and contribute much less to your development efforts. I hope this does not hinder you more that it helps.

    Best wishes

    Duarte

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Hi Pepetideo, thanks for the feedback. The vast majority of our code is still open-source, and over the long-term we hope to push all of the currently closed components into the open-source part of the release.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 70Member

    I think the biggest problem I have with this is that not really the commercial restrictions but the fact you've closed the source to what will be key components of an analysis. You are in effect asking us to take it on authority that you have your code and algorithms right and that bothers me somewhat in a Nullius in Verba kind of a way. You are not the first closed source analysis tool in genetics but you certainly are one of the more key ones.

    The other thing that concerns me is that I note that you say you "hope" to open the closed portions eventually. That doesn't sound like a guarantee, are you in a position to give us anything more concrete than hope?

    On a more practical note we've seen this model before, particularly with imputation and I can't count the hours I've spent cursing an obfuscated backtrace which has appeared when I needed the results yesterday. I assume you are going to step up your support provisions at least for the closed portions?

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Hi there,

    Let me have a look at this in more detail. In principle I'd be happy to open source everything so long as this is compatible with the non-commercial restriction. I'm going to open up this discussion with our Business Development office and see what's possible.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • luca_beltrameluca_beltrame Posts: 10Member

    The problem is that you're trying to restrict usage, am I correct? No (or few) open source or free software license allow for usage. Their concern is distribution. At least, you may want to consider Creative Commons (ShareAlike-NonCommercial) if that's the case. It will severely hurt adoption in third-party projects, but it's still better than a completely closed "black box" environment (and I second TechnicalVault's objections as well).

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    I've been looking at option in this area. The ShareAlike system is attractive. Currently I'm leaning towards a multi-license model, where our currently closed source models would have source code shared but with a non-commercial use license associated with them. Academics (and everyone really) could view the code, but as a commercial entity you would need to obtain a commercial license to use the code. But we are looking into more options.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • DavidDavid Posts: 1Member

    Very disappointing decision by limiting commercial use and creating unwelcome complexity. I can see how an unwise and greedy move can limit and even destroy the great potential of this promising tool. In a open-source era, I doubt you will be able to generate much revenue by selling a commercial copy. There are so many alternatives. Actually there are many creative ways for revenue generation, your way is not one of them and will lead to lose-lose for the whole scientific community. I suggest you separate commercial use for research from commercial distribution, and only charge the latter.

  • luca_beltrameluca_beltrame Posts: 10Member

    I don't think there's an easy solution, but for sure, going closed is not going to work. What if GATK development is halted (yes, unlikely, but this needs always to be thought) or simply there's a loss of interest in specific features? It's not just having the source code available for inspection (and review by peer-reviewed journals) but also to attract other contributions? How can one submit a patch to a closed project? This reminds me of a (heated) debated I had when pushing out software for a paper. I wanted LGPL (or at worst, Apache 2.0) and someone of the authors wanted an "on request" model. Similar topics as the ones raised here.

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 70Member

    From this side of the pond the Wellcome Trust Sanger Institute does have a policy on the software we develop here, it has to be open sourced and under specified licenses. This is in harmony with our policy that research funded by Wellcome Trust money must be published in an open access journal.

    I can't speak for the institute itself but I have a sinking feeling this decision will spark a lot of debate. The concern this gives me and what I intend to find out about, is how this will interact with any collaborations on or contributions we with to make to GATK. UK charity law is quite tight on what kind of profit making activities charities can take part in so it may involve lawyer time.

    I do understand the Broad's point of view, people are making money on software that the Broad has invested money in producing and the Broad is not getting a cut from it. Ideally the way they'd pay it forward would be to contribute testing time and improvements back, but in practice I imagine quite a few are taking a free ride. That said companies take a free ride on most of the research we do, it's just harder to make money from most of it though. This whole debate does bring the name Celera to mind though.

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,407Administrator, GATK Developer admin

    "I do understand the Broad's point of view, people are making money on software that the Broad has invested money in producing and the Broad is not getting a cut from it. "

    Well, it's more than that, really -- a lot of the Broad money actually comes from grants and so on. In effect this software is paid for by the US taxpayers. Speaking for myself (not for the organization), as a scientist AND a taxpayer, I object to seeing companies use software paid for by the people, for free, then turn around and sell the results back to the people for a profit. I think it's fair to ask commercial companies to contribute, especially since that ensures that we have the resources to keep improving the software for the research community... It's positively marxist :)

    Geraldine Van der Auwera, PhD

  • PepetideoPepetideo Posts: 11Member
    edited August 2012

    I really have to disagree with you Geraldine...

    The taxpayer contributes to these effort to accelerate innovation. Don't forget USA companies are using these tools to sell their services to the internal market and to other countries, generating revenue and jobs.

    And GATK is also a beneficiary of research done in other countries in Open Source research done all over the world.

    If you want an analogy to this ... imagine if Mozilla had not come out with a alternative browser to IE. How long do you think it would take to evolve the web to the point we have it today? Canonical and Redhat are some examples of companies that have been able to develop successful business using OpenSource models. 

    The same thing can be translated into the NGS field. Closed source development hinders innovation and free exchange of ideas.

    Thinking of the issue in simplistic terms such as Commercial companies leeching-off the taxpayer is IMHO missing the point completely!

    Post edited by Pepetideo on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,407Administrator, GATK Developer admin

    Or not so much missing the point as addressing a different point entirely... Specifically @TechnicalVault's last paragraph concerning revenue, which is not necessarily tied to the open/closed source issue.

    Speaking strictly about the revenue question, please don't caricature what I said. I did not call companies leeches; I merely brought up the point that they are using a free tool for profit-making purposes, at the partial expense of the nation's taxpayers. I don't see how it's a capital offense to ask them to contribute to the development effort more directly than they have in the past. And I certainly don't think charging companies a reasonable fee for commercial use is going to stifle science and medicine... Especially considering it will give them access to much better user support (and I say that as one of the people responsible for the "lesser" support). Those folks are going to get much better service (which will remove obstacles to innovation), because there will be money available to pay for dedicated support staff (which will create jobs). That has significant added value.

    Re: the source issue, I understand that you care deeply about keeping tools like these open source, and to be honest I rather prefer that myself. Keep in mind that the overwhelming majority of the GATK is still open-source; the only closed components are the new new tools, and even that might change.

    Geraldine Van der Auwera, PhD

  • PepetideoPepetideo Posts: 11Member
    edited August 2012

    I am sorry if I missed your point of view, but I still think that the view that because the taxpayer has supported these institutions these companies should pay because they are selling a product back to the taxpayer is a circular argument that does not make much sense to me.

    As for the leeches definition you are right.. it was my choice of word, but it fits the definition of any entity that uses a free product and does not contribute back to its development...

    Unfortunately there are a lot of companies out there that do exactly that. But for every company that does that there are many other companies/people that submit bug fixes and actively contribute to improve he product and you will lose many of those with this closed source/pay for license model.

    Post edited by Pepetideo on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,407Administrator, GATK Developer admin

    I see it more as a circular flow of money, but let's just put that down to philosophical differences.

    Hmm, your definition of leech is pretty broad (pardon the pun). To me a leech is someone who takes from the common pot and expressly avoids contributing back when they're asked to (like people who cheat on their taxes). I don't blame people/companies for using something that's offered for free -- they'd be fools not to if it's a good product. If the time comes when they're asked to contribute back and they don't, then yes they are leeches. But if they are gracious and chip in, it's all good. Of course this is a very different model from the ideal situation, in which people spontaneously contribute according to their means or abilities without being asked... A difficult path to walk in this world.

    Geraldine Van der Auwera, PhD

  • joshkornjoshkorn Posts: 1Member

    Companies pay taxes too. Some additionally indirectly support the development of tools such as the GATK through academic collaborations. I guess the way you are framing the question though, it comes down to this: taxpayers paid for the development of the GATK. Most taxpayers aren't doing genetic research. So what will benefit the taxpayers most? (Should companies be paying for the reference genome sequence? For SNP databases? Where do we draw the line?) The reason the government (taxpayers) invest in basic research is to stimulate the downstream discovery. Help us translate research into helping patients!

    I even take exception to the phrase "If the time comes when they're asked to contribute back and they don't, then yes they are leeches." Who is asking to contribute back? Not the people who paid for the development in the first place (NIH, Eli & Edyth Broad, Harvard, MIT)! The GATK became widely used not only because it was good (it is), and not only because it was free (it was), but because of a huge investment from other projects (most notably the 1000 genomes project, but others as well) that got it free publicity and turned it into a de facto standard. It's hard to compete when the GATK team has earlier access to taxpayer-funded projects/data/sequencing, and guaranteed publications when these projects come out. Also, gaining market share and then raising prices sounds more capitalist than Marxist, to reference an earlier comment.

    I don't mean to go on a tirade; I guess I just feel strongly about this. I definitely understand where you are coming from; I too have written popular open-source software that cost me personally plenty of time of support, and meanwhile surely helped some for-profit entities do research (I hope!) and one for-profit company in particular have success. Nonetheless, I knew it was my duty to share the software freely. Also please know that I say all this with the utmost respect for the whole GATK team (most of which I think I know--I don't think we've had the pleasure of meeting though, Geraldine). You guys are doing a great job, and it's wonderful that taxpayers have been able to fund the development of (documented, supported) academic software. I'd be happy to take everyone out for drinks after work some day and thank you personally; justifying paying for the software is difficult.

    *Note: these views are my own, and do not necessarily reflect those of my company or colleagues.

  • ebanksebanks Posts: 683GATK Developer mod

    Yeah, I'm in for beer on Josh.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,407Administrator, GATK Developer admin

    @Josh, you make a whole bunch of fair points. Yet perhaps some are made in response to something I never meant to say, but which seems to be what got across: that I am advocating making companies pay through the nose because they are all rich, greedy and antisocial. That's really not my position at all though.

    The "leeches" thing, for one, seems to be dogging me, even though it was not my phrase to begin with (thanks @TechnicalVault! ;) ). The phrase you find so objectionable wasn't intended strictly in the context of the GATK model, it was more an abstract discussion on what I would theoretically consider to be leeching -- my bad for not making that clear. It could use some nuance as well.

    For the rest -- I agree with most of what you said. Goodness knows I wouldn't want to see things like fees for using reference sequences or SNPS -- that would be preposterous, like patenting genes, and I hate that people are actually trying that. But surely we can agree that tools and data are different things? Naively I would think that line is pretty clear. I'm also very aware that GATK wasn't made in a vacuum; like when a very high profile figure recently said "You didn't make that", it takes a lot of infrastructure and effort from other people to make something like this a success. No argument from me there.

    But I still don't see why charging a reasonable fee is such a bad thing. Note the word "reasonable"... To be honest I have no idea what the exact pricing scheme is going to be (that's way above my pay grade) but my understanding is that it's supposed to be adapted to the usage the company makes of it. The idea being that you wouldn't think of charging a small startup that runs GATK on 3 genomes a month the same amount that you would charge a company that plans to install GATK on their server and charge their customers (researchers!) for running their data on it, with hardware and network as their only real expense... In the latter case, can you really argue that they shouldn't be paying something for the use of the tool?

    The final point I want to make is to reiterate how having this system in place is going to drive improvement of the GATK in two ways, as far as I can see. One is that the added revenue will buy us more hardware and more people, as well as make us less dependent on grants. The other is that because all the commercial users will be getting their own separate, dedicated support service, it will lift a non-negligible portion of the support burden from the team. As the user community grows, so does the amount of work it takes to field questions and troubleshoot issues. As you mentioned yourself, GATK is documented and supported, to an extent that is not so common for academic software. And we're working hard to improve that as much as we can (please say you like the new website). But that's time taken away from further development. As you probably know, we are all scientists and/or software engineers. The closest thing we have to a full-time support person is, well, me (no we haven't met, I just recently joined the group), because outreach is in my job description -- but it's not actually supposed to be a full-time thing (also, Eric gives me all the answers to the hard questions). So the new model would help us get on back to the development work, which is good for everyone.

    Hah, I think my tirade beats your tirade. But we should definitely continue this over drinks. I make big hand gestures when I argue a point, and gesturing at my computer isn't half as much fun.

    Hey, there's an idea... instead of charging them directly for the use of the GATK, have the companies buy us beer. Not sure it will increase productivity but it sure will lift burdens :)

    • Same disclaimer, by the way -- I don't speak for anyone else but me (and even I don't agree with me sometimes).

    Geraldine Van der Auwera, PhD

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 70Member
    edited August 2012

    Whilst I didn't use the word leeches first (that was @Pepetideo) I probably should take the blame for bringing the concept into the argument.

    Charging on number of invocations, or bases processed will probably be a non-starter just because of the complexity of enforcing it and the fact people will hate it. There are going to be situations which are complicated, for example if someone commercial is paying to run GATK on a university farm or someone from a university is running GATK on a commercial cloud such as Amazon. This is nothing compared to how complicated it gets if a university team is collaborating with a commercial company (pharmacogenetics etc). I hope for your sakes and for everyone else's sanity that in the desire for fairness the licensing model doesn't get excessively complicated.

    Unfortunately though I think the intended thrust of my argument got missed, and that was the chilling effect on collaboration angle. People will hold back from contributing for ideological reasons, and because the policies of their workplace may preclude it. The possible upside to this is that already I've heard mutterings about putting more resources into samtools development once again. I don't think it's entirely healthy to have one true genotype caller or any piece of software for that matter. In scientific research, nothing spurs innovation like a good argument about the right way to do things.

    P.S. If you're ever in the "other" Cambridge beer is a capital idea.

    Post edited by TechnicalVault on

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • PepetideoPepetideo Posts: 11Member
    edited August 2012

    Yes it was me... guilty as charged.

    Maybe that word has an even more negative connotation than I was trying to give. I was just referring to the behavior of some companies being like one-way stream (take freely available tools, use them and do not contribute back ever, not even to report bugs)

    I do not speak on behalf of my company as well but I am pretty sure if you guys are ever anywhere near Oxford, me and some of my colleagues, would love to buy you some pints of our best beer (you guys deserve them) :-)

    Post edited by Pepetideo on
  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Hi all,

    I want to chime in with three clarifying points:

    • We don't yet know the pricing scheme, but we are keenly aware of the complications of per-use licensing as TechnicalVault brings up

    • Overall I want everyone to tone down the moral issue surrounding commercial licensing. The discussion of moral issues, extracting of rents, leeching off taxpayers, are all counterproductive to helping understand what we have decided and the best path forward. All of this is just business, after all.

    • The NIH is clear that when funding basic research that the support IP is generally owned the developing institution (I'm sure there are exceptions), and this is true for software in general. The only key software restriction I know of is that the software must be made available to federal employees upon request. The reason for this policy is obvious -- it would be extremely difficult to accept the trade-off in a grants with IP ownership if you are creating high-value IP. Even federal SBIR grants spell out clearly that the government does not own any IP associated with the support. The federal granting system is to foster innovation, not to own innovation. It's a subtle difference but important for anyone creating real IP value with federal money in the US

    • Many users of the GATK would like a much higher level of support than the Broad institute could possibly provide, as this is off track for the Broad's mission to transform medicine through genomics. We believe that having a commercial license for the GATK will allow us to actually deliver on this superior support and continue to grow the GATK as a reliable standard for NGS analysis in the commercial sphere and beyond. Without a commercial version we simply cannot follow through on this opportunity.

    • We attempted to make the GATK easy for others to contribute code to, but our experiences in this area have been disappointing. Many people use the GATK for developing tools -- and we are committed to ensuring the programming framework and libraries remain MIT licensed -- we have had little contribution over 3 years to the master codebase from independent third-parties. Certainly some of our collaborators have contributed impressive tools and extensions, but again they aren't really independent. There's a good wikipedia article on the experiences of mySQL similar to this, and they release with a dual-license similar to our approach. Still though I'd like to release the source code to all of the tools -- if we can find a way consistent with the commercial license -- for transparency to the community and to allow others to contribute, in so far as they like.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • BobMorrisBobMorris Posts: 1Member

    At least as egregious as the closed license is the Indemnification clause in Section 5. By agreeing to the license, the licensee is agreeing to pay the legal and other expenses of Broad in case somebody sues Broad over some perceived harm due to the Licensee's use. Academic users who bother to consult the people in their institutions with the actual authority to execute licenses are very likely to find that their institutions are unwilling to enter into such indemnification clauses. Is Broad willing to waive Section 5 for academic institutions that will not agree to it?

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Hi Bob,

    I spoke with our legal department and they said "We are happy to make GATK2 available to academics for free but the Broad can't incur liability as a result of that which is the reason for the indemnification language. The indemnification states that if the Broad gets sued based on the academic users actions then they indemnify us - its within their control to use the program appropriately. This language is very standard and any institution should be familiar with it."

    Have you reviewed the language with your institution? If not you may consider getting expert advice on this topic.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 70Member
    edited August 2012

    License part 2.3 is terribly problematic for me: "LICENSEE further agrees that the PROGRAM shall not be rewritten in another computer language or otherwise adapted to circumvent the need for obtaining a license for use of the PROGRAM." This essentially forbids me from contributing similar changes to samtools and is not something I can agree to. I'm concerned this clause could necessitate an institute wide ban on GATK 2.0 usage in order to avoid our developers being unable to write a large portion of sequencing software. It seems like a "non-compete", how on earth did it get in there?

    As for the reverse engineering clause, I find that problematic for the reasons I stated earlier. If you want to publish results based on the output of a program, you need a description of what that program is doing in your methods section or to be able to cite a paper that does that for you. Thus any "secret sauce" that could be discovered from reverse engineering should probably be published publicly already anyway.

    Post edited by TechnicalVault on

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • luca_beltrameluca_beltrame Posts: 10Member

    I hope I'm not "necro-posting": I actually wanted to add to the discussion but I went on holiday the moment this thread got moved. Feel free to delete this post if you feel this is the case. Like Martin said above me, I am contemplating the option of avoiding the use of GATK completely. I had integrated the previous version in a pipeline I was using internally (euphemism, since AFAIK in my small institution I'm one of the few doing this). Now, this pipeline uses a Free license, and due to the adjustment of GATK's licensing, I would no longer be able to distribute my software (you cannot mix Free and non Free licenses). Additionally, there's no way I can contribute patches and so on now if I wanted to. Everything is in Broad's unique hands. Also I share Martin's doubts on reverse engineering and the circumvention clauses in the license. I've said it before, but this is a huge disappointment and an equally huge step back for NGS analysis.

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    Hi Luca,

    That's really a shame to hear. Why not stay with GATKLite, which meets all of your needs minus the premium tools? GATKLite is MIT licensed, just as GATK 1.6 was, and contains all baseline tools? We continue to get patches to the GATKLite tools and infrastructure as well.

    Basically, we were trapped keeping many advanced tools private to the Broad simply because we didn't want to release source at the same time as we made them available to the community. By decoupling source release vs. tool release we are able to provide them to the community faster than if we have to wait until we are happy with full source release. It seems to us that this is a win for everyone, compared to the alternative which was active since GATK 1.2 at least in which we had advanced tools only at the Broad that only Broadies could access.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • gdugdu Posts: 1Member
    edited September 2012

    Hi Mark/GATK team,

    First off,thanks for your contributions in making this mature piece of software which is an invaluable gem for nextgen research worldwide. I would like to keep using GATK for variant calling since it has proven to be accurate & reliable, but I am a bit concerned about the exact nature of your licensing change and would like to describe my situation to see if my group can continue using it.

    We (the Genome Discovery Unit) are a group of 4 bioinformaticians at the sequencing core facility of the Australian National University in Canberra. Basically, we do downstream bioinformatic analysis of the data that comes off the sequencers, and operate on a cost-recovery model for our time. So although we do charge a small fee for our specific analyses, both our team and our customers are academic, not-for-profit entities within the local universities and government-funded scientific research institutes.

    Would we be able to use GATK 2.0 under the academic license? I sincerely hope for a favourable response, or we will have to seriously reconsider our variant-calling pipeline options.

    Warmest regards, Aaron Chuah, Genome Computational Specialist, Genome Discovery Unit, The John Curtin School of Medical Research, The Australian National University.

    Post edited by gdu on
  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    The best option for you is to contact Issi Rozen about this at the Broad. He can provide exact confirmation. But my view is that if you consider yourselves a academic support group there's no reason I can think of that you don't qualify for the academic license.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 70Member
    edited October 2012

    Hi Mark, I have a couple of questions stemming from the FAQ posted by your new commercial partners:

    1. Regarding, "Why not stay with GATKLite?" According to the FAQ at your new partner's site "Broad has indicated that GATK-lite tools will soon be obsolete, and it plans to stop supporting the tools by the end of 2012." Can you confirm exactly what this means please? Is it all of GATK-lite which will be dropped or just tools which have been replaced by new ones from GATK2?

    2. "Use by a not-for-profit organization to generate revenue requires a commercial license", can you clarify what this means? For example providing sequencing services to other academic institutions generates revenue, however it is usually done at cost so does not generate profit.

    3. If a not-for-profit was interested in buying support, but not in buying a commercial license is there an option for this? Who would it be with?

    4. Finally who will be the final arbiter of usage terms? Does that remain with the Broad or have you signed enforcement over to your partners?

    Thank you for all your hard work

    Post edited by TechnicalVault on

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,407Administrator, GATK Developer admin
    edited October 2012

    Hi Martin,

    1. CORRECTION: The wording in the FAQ was incorrect due to a miscommunication. We will in fact continue providing support for GATK-Lite tools indefinitely, and although we will eventually stop providing a separate build (jar file), the GATK-Lite codebase will remain publicly available on our open source GitHub repository. In addition, tools from GATK 2 will be migrated into the GATK-Lite codebase over time.
    2. and 3. Please direct these questions to our partner, Appistry. They will be able to tell you based on your specific circumstances. They have a contact form that you can use, and within a few days they will also have a discussion forum that you can use for this purpose.
    3. See above.
    4. I believe Mark will answer that for you, or direct you to Issi Rozen here at the Broad, if you want an answer from our side. Otherwise I expect the Appistry people should be able to answer that as well.
    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    For 4 -- we don't discuss terms of agreements between the Broad and third-party in general.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 70Member

    With regard to point 1, thank you for your clarification, I will pass this on. Likewise with 2 and 3.

    With regard to point 4, I don't really want to know the specifics of your confidential agreements. However what I do need to know if you can tell me without spilling the beans is whether the Broad is still the right people to talk to about compliance issues with GATK for a not-for-profit. Specifically whether or not assurances Broad gives about the new licensing arrangement either via private communication or via these forums are binding or we need to seek those same assurances from Appistry too.

    Please understand, I'm not trying to be confrontational here, however the fact that this is now a business dimension to this relationship in addition to the academic one makes things a little more complicated. This means I need to ensure we stay in compliance with our agreement with you because you're such an important part what we do.

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • Mark_DePristoMark_DePristo Posts: 153Administrator, GATK Developer admin

    I would recommend that you discuss 4 with Appistry and Issi (at the Broad) to determine where you fit in the non-commercial -> commercial spectrum. They can provide you with legally binding contracts for your specific situation.

    -- Mark A. DePristo, Ph.D. Co-Director, Medical and Population Genetics Broad Institute of MIT and Harvard

Sign In or Register to comment.