The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.


Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited September 2014 in Developer Zone

1. Overview

The Tribble project was started as an effort to overhaul our reference-ordered data system; we had many different formats that were shoehorned into a common framework that didn't really work as intended. What we wanted was a common framework that allowed for searching of reference ordered data, regardless of the underlying type. Jim Robinson had developed indexing schemes for text-based files, which was incorporated into the Tribble library.

2. Architecture Overview

Tribble provides a lightweight interface and API for querying features and creating indexes from feature files, while allowing iteration over know feature files that we're unable to create indexes for. The main entry point for external users is the BasicFeatureReader class. It takes in a codec, an index file, and a file containing the features to be processed. With an instance of a BasicFeatureReader, you can query for features that span a specific location, or get an iterator over all the records in the file.

3. Developer Overview

For developers, there are two important classes to implement: the FeatureCodec, which decodes lines of text and produces features, and the feature class, which is your underlying record type.


For developers there are two classes that are important:

  • Feature

    This is the genomicly oriented feature that represents the underlying data in the input file. For instance in the VCF format, this is the variant call including quality information, the reference base, and the alternate base. The required information to implement a feature is the chromosome name, the start position (one based), and the stop position. The start and stop position represent a closed, one-based interval. I.e. the first base in chromosome one would be chr1:1-1.

  • FeatureCodec

    This class takes in a line of text (from an input source, whether it's a file, compressed file, or a http link), and produces the above feature.

To implement your new format into Tribble, you need to implement the two above classes (in an appropriately named subfolder in the Tribble check-out). The Feature object should know nothing about the file representation; it should represent the data as an in-memory object. The interface for a feature looks like:

public interface Feature {

     * Return the features reference sequence name, e.g chromosome or contig
    public String getChr();

     * Return the start position in 1-based coordinates (first base is 1)
    public int getStart();

     * Return the end position following 1-based fully closed conventions.  The length of a feature is
     * end - start + 1;
    public int getEnd();

And the interface for FeatureCodec:

 * the base interface for classes that read in features.
 * @param <T> The feature type this codec reads
public interface FeatureCodec<T extends Feature> {
     * Decode a line to obtain just its FeatureLoc for indexing -- contig, start, and stop.
     * @param line the input line to decode
     * @return  Return the FeatureLoc encoded by the line, or null if the line does not represent a feature (e.g. is
     * a comment)
    public Feature decodeLoc(String line);

     * Decode a line as a Feature.
     * @param line the input line to decode
     * @return  Return the Feature encoded by the line,  or null if the line does not represent a feature (e.g. is
     * a comment)
    public T decode(String line);

     * This function returns the object the codec generates.  This is allowed to be Feature in the case where
     * conditionally different types are generated.  Be as specific as you can though.
     * This function is used by reflections based tools, so we can know the underlying type
     * @return the feature type this codec generates.
    public Class<T> getFeatureType();

    /**  Read and return the header, or null if there is no header.
     * @return header object
    public Object readHeader(LineReader reader);

4. Supported Formats

The following formats are supported in Tribble:

  • VCF Format
  • DbSNP Format
  • BED Format
  • GATK Interval Format

5. Updating the Tribble, htsjdk, and/or Picard library

Updating the revision of Tribble on the system is a relatively straightforward task if the following steps are taken.

NOTE: Any directory starting with ~ may be different on your machine, depending on where you cloned the various repositories for gsa-unstable, picard, and htsjdk.

A Maven script to install picard into the local repository is located under gsa-unstable/private/picard-maven. To operate, it requires a symbolic link named picard pointing to a working checkout of the picard github repository. NOTE: compiling picard requires an htsjdk github repository checkout available at picard/htsjdk, either as a subdirectory or another symbolic link. The final full path should be gsa-unstable/private/picard-maven/picard/htsjdk.

cd ~/src/gsa-unstable
cd private/picard-maven
ln -s ~/src/picard picard

Create a git branch of Picard and/or htsjdk and make your changes. To install your changes into the GATK you must run mvn install in the private/picard-maven directory. This will compile and copy the jars into gsa-unstable/public/repo, and update gsa-unstable/gatk-root/pom.xml with the corresponding version. While making changes your revision of picard and htslib will be labeled with -SNAPSHOT.

cd ~/src/gsa-unstable
cd private/picard-maven
mvn install

Continue testing in the GATK. Once your changes and updated tests for picard/htsjdk are complete, push your branch and submit your pull request to the Picard and/or htsjdk github. After your Picard/htsjdk patches are accepted, switch your Picard/htsjdk branches back to the master branch. NOTE: Leave your gsa-unstable branch on your development branch!

cd ~/src/picard
ant clean
git checkout master
git fetch
git rebase
cd htsjdk
git checkout master
git fetch
git rebase

NOTE: The version number of old and new Picard/htsjdk will vary, and during active development will end with -SNAPSHOT. While, if needed, you may push -SNAPSHOT version for testing on Bamboo, you should NOT submit a pull request with a -SNAPSHOT version. -SNAPSHOT indicates your local changes are not reproducible from source control.

When ready, run mvn install once more to create the non -SNAPSHOT versions under gsa-unstable/public/repo. In that directory, git add the new version, and git rm the old versions.

cd ~/src/gsa-unstable
cd public/repo
git add picard/picard/1.115.1499/
git add samtools/htsjdk/1.115.1509/
git rm -r picard/picard/1.112.1452/
git rm -r samtools/htsjdk/1.112.1452/

Commit and then push your gsa-unstable branch, then issue a pull request for review.

Post edited by kshakir on
720 x 540 - 79K


  • lindenblindenb FranceMember

    FYI, I just wrote a FeatureCodec for json-based annotations files: see

  • CarneiroCarneiro Charlestown, MAMember

    That is great news linden, we are in the process of making Tribble a public open source library and would be great to incorporate your changes into it.

  • ArthurPGoldbergArthurPGoldberg MSSMMember
    edited September 2013

    How does one use Tribble to parse a bed file? I'm trying to run the HaplotypeCaller in GATK 2.7..., and sting/utils/interval/ throws the exception:
    org.broadinstitute.sting.utils.exceptions.ReviewedStingException: BED files must be parsed through Tribble; parsing them as intervals through the GATK engine is no longer supported


    Please cc response to

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi Arthur,

    I'm afraid we don't currently provided detailed support for development issues. I would recommend looking up a walker that takes a BED file as input to see how the code is utilized in that case.

    Also, I'm sorry but we don't send out email responses. Everything is done via the forum; if you have email filters on that are an issue you'll need to tweak them to ensure you receive any notification emails as are sent when someone responds to a comment.

  • juberjuber Member

    What is the relationship/difference between tribble and tabix? Are both supported? Is one preferred over the other? They seem to have the same purpose.



  • SheilaSheila Broad InstituteMember, Broadie, Moderator


    Hi Juber,

    Tribble is an I/O library we use internally. It has a wider range of functionality than tabix. Tribble actually uses tabix for certain operations. So they're not really redundant as far as I know.


Sign In or Register to comment.