Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Installing GATK through cygwin

jjzievejjzieve Posts: 0Member
edited January 2013 in Ask the team

Hi ya'll

I don't have access to the specific instructions for installing GATK on a windows platform (i.e. using cygwin). If I could get permission or someone could walk me through this I would be grateful.

Best,
Jacob Zieve
UC Davis

Post edited by Geraldine_VdAuwera on

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    Hi Jacob,

    We don't provide support for installing GATK on Windows machines, so I'm moving this to the "Ask the Community" section. Hopefully someone in our user community will be able to help you with this.

    Good luck!

    Geraldine Van der Auwera, PhD

  • evolvedmicrobeevolvedmicrobe MGHPosts: 14Member

    Hi Jacob,

    Since the GATK is written in Java and since it doesn't have a GUI program associated with it, one might expect it could work on any type of operating system. I just tried running through the “How to run the GATK for the first time” tutorial on windows without Cygwin, and it worked just fine. You might just go ahead and try running it normally on windows from the command shell.

    I am not entirely sure why the GATK it is described as only for posix systems. Skimming through the source code it looks like all of the things for cluster computing would only work on a rather specialized linux cluster, but since one would never run those features using a mac either, I suspect those sections of the code aren’t the problem. There are also some standard C library calls in the code, but again, I can’t really see why these would be used for what you might be doing with a personal computer. In any event, it might be worth running things on your machine and then switching to a linux setup if things don’t work. Cygwin/Ming/SUA are all usable, but is likely to be more hassle than it would be worth.

    On a side note, the link for Cygwin instructions on this page: http://www.broadinstitute.org/gatk/about/#using-the-gatk Appears to be broken.

    Hope that helps, Nigel

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    Hi folks,

    The reason we describe GATK as posix-only is that we have reasons to believe that some of the I/O functions and more advanced features might not work properly on other platforms, but we don't have the resources to look into this at the level of detail that would be necessary to provide precise explanations. And since we we have no experience running it on other platforms ourselves, we can't provide any support for that nor commit any resources to doing so. We know some users do run it on Windows with various types of setup including Cygwin, so we would like to document that at some point (that broken link is a placeholder for when we get around to it) but for now it just isn't a priority, sorry.

    Geraldine Van der Auwera, PhD

  • evolvedmicrobeevolvedmicrobe MGHPosts: 14Member

    Hi All,

    Certainly understand about the need to keep the problem focused. As an update, I found that the GATK cannot work on non-POSIX systems, and I would probably recommend against trying to use it on Windows (even with SUA/Ming/cygwin) as well as Solaris, HP-UX and some other Linux operating systems, as it will likely cause problems.

    For anyone with a deeper interest, one problem is that the GATK uses “advisory” file locking. For a brief explanation, an operating system often has to control how multiple programs can access shared resources, like files on a hard drive. You can imagine problems might occur if two programs try to write to the same file at the same time. To resolve this, one can in code request a “lock” on a file from the operating system, which is a mechanism that prevents separate programs from accessing the same file.

    However, getting a lock means different things on POSIX and non-POSIX systems. On non-POSIX systems like windows, if a program receives a lock on a file the OS assumes that the program wants exclusive access to that file and so prevents other programs from writing to the file it handed the lock out for. On POSIX systems however, the lock is called “advisory” because it actually has no enforcement mechanism behind it. In a POSIX system, a second program, if it wants can see that a first program has a lock and so prevent itself from writing to the file. However, if it doesn’t care to play along with the “lock” the first program obtained from the OS it can go write on ahead (pun intended :) ). There is nothing the first program can do about the second program writing to its file.

    Because parts of the GATK ignore file locks, they will not work on operating systems that enforce them. As an example of a problem this causes, if you try to run a tool that uses a reference FASTA file without an index file for the reference, the GATK will try to make one for you. However, this fails because of lock conflict. The GATK makes one instance of a File class object and puts a lock on it. Later in the code, it creates another instance of the file and then writes to it. For instance, the code snippet below from the GATK works fine on POSIX but fails on Windows:

    final File indexFile = new File(faiFile);
    FSLockWithShared indexLock = new FSLockWithShared(indexFile,true);
    indexLock.exclusiveLock();//Makes a new file and locks it
    out = new BufferedWriter(new FileWriter(faiFile));//Fails on windows, can’t make another new file with the same name when a lock has already been handed out for that file
    

    This type of code destroys cross-platfrom compatibility, and because it is a pretty low-level thing, I would avoid trying to get the GATK to work on non-POSIX by fudging it with Cygwin, etc. entirely.

    Anyway, again I certainly understand the need to keep the problem for the GATK simple and from what I hear it works fantastically where it is supposed to, which is definitely the most important thing. However, if possibly allowing for a cross-platform compatibile implementation by some future community group might be nice, I might suggest that if possible, the preferred coding choice might be to only use one instance of a file at a time and to try give space to file locks. For instance this piece of code also locks the file, but is cross platform compatible:

       String f="Test4.fai";
        File indexFile = new File(f);
        FileOutputStream fos= new FileOutputStream(f);
        FileLock fl = fos.getChannel().tryLock(0, Long.MAX_VALUE, false);
        if(fl != null) {
            BufferedOutputStream bw=new BufferedOutputStream(fos);
            bw.write("test".getBytes());
            bw.close();        }
    

    Cheers, Nigel

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,293Administrator, GSA Member admin

    Nigel, thanks for providing this detailed explanation! FYI, we do welcome code patches from the community if there is an improvement that is desired but not currently on our priority list...

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.