Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Is it possible to improve compile time of GATK/Queue

dklevebringdklevebring Member
edited June 2015 in Ask the GATK team

Hi,

I was wondering if it's possible to improve the compile time of GATK. I'm aware of

mvn verify -P\!queue 

to avoid building Queue when compiling gatk-public. However, when developing Queue, the compile time for gatk-protected is huge, slowing down the development. Especially, the step

[INFO] Minimizing jar org.broadinstitute.gatk:gatk-queue-package-distribution:jar:3.3

takes a long time. Do you have any tricks up your sleeve to improve this during development? (I'm working on a custom JobRunner)

thanks

Daniel

Best Answer

  • kshakirkshakir ✭✭
    Accepted Answer

    TL;DR: mvn -Ddisable.shadepackage verify

    Background:

    In addition to Queue's GATK-wrapper codegen, relatively slow scala compilation, etc. there's still a lot of legacy compatibility from our ant days in the Maven scripts. Our mvn verify behaves more like when one runs ant, and builds everything needed to bundle the GATK.

    As of GATK 3.4, by default the build for the "protected" code generates jar files that contains every class needed for running, one for the GATK and one for Queue. This is done by the Maven shade plugin, and are each called the "package jar". But, there's a way to generate a jar file that only contains META-INF/MANIFEST.MF pointers to the dependency jar files, instead of zipping/shading them up. These are each the "executable jar", and FYI are always generated as it takes seconds, not minutes.

    While developing and recompiling Queue, disable the shaded jar with -Ddisable.shadepackage. Then run java -jar target/executable/Queue.jar ... If you need to transfer this jar to another machine / directory, you can't copy (or rsync) just the jar, you'll need the entire executable directory.

    # Total expected time, on a local disk, with Queue:
    #   ~5.0 min from clean
    #   ~1.5 min per recompile
    mvn -Ddisable.shadepackage verify
    
    # always available
    java -jar target/executable/Queue.jar --help
    
    # not found when shade disabled
    java -jar target/package/Queue.jar --help
    

    If one is only developing for the GATK, skip Queue by adding -P\!queue also.

    mvn -Ddisable.shadepackage -P\!queue verify
    
    # always available
    java -jar target/executable/GenomeAnalysisTK.jar --help
    
    # not found when queue profile disabled
    java -jar target/executable/Queue.jar --help
    

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If it's any comfort we find it painful too :/

    We've talked about putting in some optimizations but I'm not sure what the status is. I'll check with the engineers...

  • kshakirkshakir Broadie, Dev ✭✭
    Accepted Answer

    TL;DR: mvn -Ddisable.shadepackage verify

    Background:

    In addition to Queue's GATK-wrapper codegen, relatively slow scala compilation, etc. there's still a lot of legacy compatibility from our ant days in the Maven scripts. Our mvn verify behaves more like when one runs ant, and builds everything needed to bundle the GATK.

    As of GATK 3.4, by default the build for the "protected" code generates jar files that contains every class needed for running, one for the GATK and one for Queue. This is done by the Maven shade plugin, and are each called the "package jar". But, there's a way to generate a jar file that only contains META-INF/MANIFEST.MF pointers to the dependency jar files, instead of zipping/shading them up. These are each the "executable jar", and FYI are always generated as it takes seconds, not minutes.

    While developing and recompiling Queue, disable the shaded jar with -Ddisable.shadepackage. Then run java -jar target/executable/Queue.jar ... If you need to transfer this jar to another machine / directory, you can't copy (or rsync) just the jar, you'll need the entire executable directory.

    # Total expected time, on a local disk, with Queue:
    #   ~5.0 min from clean
    #   ~1.5 min per recompile
    mvn -Ddisable.shadepackage verify
    
    # always available
    java -jar target/executable/Queue.jar --help
    
    # not found when shade disabled
    java -jar target/package/Queue.jar --help
    

    If one is only developing for the GATK, skip Queue by adding -P\!queue also.

    mvn -Ddisable.shadepackage -P\!queue verify
    
    # always available
    java -jar target/executable/GenomeAnalysisTK.jar --help
    
    # not found when queue profile disabled
    java -jar target/executable/Queue.jar --help
    
  • dklevebringdklevebring Member

    Thanks @kshakir ! Just was I was looking for!

Sign In or Register to comment.