How can I submit a patch to the GATK codebase?
This article has been retired. If you would like to contribute a patch to GATK 3.*, please let us know in the forum. You may also consider contributing to the GATK4 codebase instead, which is where we are putting most of our development effort from here onward.
The GATK is an open source project that has greatly benefited from the contributions of outside users. The GATK team welcomes contributions from anyone who produces useful functionality in line with the goals of the toolkit. You are welcome to branch the GATK main repository and develop your own tools. Sometimes these tools may be useful to the GATK user community and you may want to make it part of the main GATK distribution. If so we ask you to follow our guidelines for submission of patches.
1. Good practices
There are a few good GIT practices that you should follow to simplify the ultimate goal, which is, adding your changes to the main GATK repository.
- Use branches.
- Every time you start new work that you are going to submit to the GATK team later, do it in a new branch. Make it a habit as this will simplify many of the following procedures and allow your master branch to always be a fresh (up to date) copy of the GATK main repository. Take a look on [[#How to create a new submission| how to create a new branch for submission]].
- Never merge.
- Merging creates a branched history with multiple parent nodes that make history hard to understand, impossible to modify and patches near-impossible to create. Merges are very useful when you need to combine multiple repositories and it should ''only'' be used when it makes sense. This means '''never merge''' and '''never pull''' (if it's not a fast-forward, or you will create a merge).
- Commit as often as possible.
- Every change, should be committed to make sure you can go back in time effectively in your own tree. The commit messages don't matter to us as long as they're meaningful to you in this stage. You can essentially do whatever you want in your local tree with your commits, as long as you don't merge.
- Rebase constantly
- Your branch is diverging from the master by the minute, so if you keep rebasing as often as you can, you will avoid major conflicts when it's time to send the patches. Take a look at our guide on [[#How to rebase | how to rebase]].
- Tell a meaningful story
- When it's time to submit your patches to us, reorder your commits and write meaningful commit messages. Each commit must be (as much as possible) self contained. These commits must tell a meaningful story to us so we can understand what it is you're adding to the codebase. Take a look at an [[#How to make your commits | example commit scenario]].
- Generate patches and email them to the group
- This part is super easy, provided you've followed the good practices. You just have to [[#How to generate the patches | generate the patches]] and e-mail them to email@example.com.
2. How to create a new submission
You should always start your code by creating a new branch from the most recent version of the main repository with :
git checkout master (make sure you are in the master branch) git fetch && git rebase origin/master (you can substitute this line for "git pull" if you have no changes in the master branch) git checkout -b newtool (create a new branch for your new tool)
Note: If you have submitted a patch to the group, do not continue development on the same branch as we cannot guarantee that your changes will make it to the main repository unchanged.
3. How to rebase
Every time before you rebase, you have to update your copy of the main repository. To do this use:
If you are just trying to keep up with the changes in the main repository after a fetch, you can rebase your branch at anytime using (and this should be all you need to do):
git rebase origin/master
In case there are conflicts, resolve them as you would and do:
git rebase --continue
If you don't know how to resolve the conflicts, you can always safely abort the whole process and go back to your branch before you started rebasing:
git rebase --abort
If you are done and want to generate your patches conforming to the latest repository changes, to edit, squash and reorder your commits use :
git rebase -i origin/master
At the prompt, you can follow the instructions to squash, edit and reorder accordingly. You can also do this step from IntelliJ with a visual editor that allows you to select what to edit/squash/reorder. You can also take a look at this nice tutorial on how to use interactive rebase.
4. How to make your commits
It is okay to have a list of commits (numbered) somewhat like this in your local tree:
- added function X
- fixed a b and c on X
- b was actually d
- started creating feature Y but had to go to the bathroom
- added Y
- found bug in X, fixed with e
- added Z
- fixed bug in Z with f
Before you can send your tools to us, you have to organize these commits so they tell a meaningful history and are self contained. To achieve this you will need to rebase so you can squash, edit and reorder your commits. This tree makes a lot of sense for your development process, but it makes no sense in the main repository history as it becomes hard to pick/revert commits and understand the history at a glance. After rebasing, you should edit your commits to look like this:
- added X (including commits 2, 3 and 6)
- added Y (including commits 4 and 5)
- added Z (including commits 7 and 8)
Use your commit messages wisely to help quick processing of your patches. Make sure the first line of your commit messages have less than 50 characters (title). Add a blank line and write a paragraph or more explaining what this commit represents (now that it is a package of multiple commits. It is important to have the 50 char title because this is all we see when we look at an extended history to find bugs and it is also our quick access to remember what the commit does to the repository.
A patch should be self contained. Meaning if we decide to adopt feature X and Z but not Y, we should be able to do so by only applying patches 1 and 2. If your patches are co-dependent, you should say so in the commits and justify why you didn't squash the commits together into one tool.
5. How to generate the patches
To generate patches, use :
git format-patch since
The since parameter is the last commit you want to generate patches from, for example: HEAD^3 will generate patches for HEAD^2, HEAD^1 and HEAD. You can also specify the commit by its id or by using the head of a branch. This is where using branches will make your life easier. If master is always up to date with the main repo with no changes, you can do:
git format-patch master (provided your master is up to date)
This will generate a patch for each commit you've created and you can simply e-mail them as an attachment to us.