The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Writing walkers in Scala

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
edited October 2012 in Developer Zone

1. Install scala somewhere

At the Broad, we typically put it somewhere like this:

/home/radon01/depristo/work/local/scala-2.7.5.final

Next, create a symlink from this directory to trunk/scala/installation:

ln -s /home/radon01/depristo/work/local/scala-2.7.5.final trunk/scala/installation

2. Setting up your path

Right now the only way to get scala walkers into the GATK is by explicitly setting your CLASSPATH in your .my.cshrc file:

setenv CLASSPATH /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/FourBaseRecaller.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GenomeAnalysisTK.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/Playground.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/StingUtils.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/bcel-5.2.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/colt-1.2.0.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/google-collections-0.9.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/javassist-3.7.ga.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/junit-4.4.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/log4j-1.2.15.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/picard-1.02.63.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/picard-private-875.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/reflections-0.9.2.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/sam-1.01.63.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/simple-xml-2.0.4.jar:/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar:/humgen/gsa-scr1/depristo/local/scala-2.7.5.final/lib/scala-library.jar

Really this needs to be manually updated whenever any of the libraries are updated. If you see this error:

Caused by: java.lang.RuntimeException: java.util.zip.ZipException: error in opening zip file
        at org.reflections.util.VirtualFile.iterable(VirtualFile.java:79)
        at org.reflections.util.VirtualFile$5.transform(VirtualFile.java:169)
        at org.reflections.util.VirtualFile$5.transform(VirtualFile.java:167)
        at org.reflections.util.FluentIterable$3.transform(FluentIterable.java:43)
        at org.reflections.util.FluentIterable$3.transform(FluentIterable.java:41)
        at org.reflections.util.FluentIterable$ForkIterator.computeNext(FluentIterable.java:81)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:132)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:127)
        at org.reflections.util.FluentIterable$FilterIterator.computeNext(FluentIterable.java:102)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:132)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:127)
        at org.reflections.util.FluentIterable$TransformIterator.computeNext(FluentIterable.java:124)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:132)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:127)
        at org.reflections.Reflections.scan(Reflections.java:69)
        at org.reflections.Reflections.<init>(Reflections.java:47)
        at org.broadinstitute.sting.utils.PackageUtils.<clinit>(PackageUtils.java:23)

It's because the libraries aren't updated. Basically just do an ls of your trunk/dist directory after the GATK has been build, make this your classpath as above, and tack on:

/humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar:/humgen/gsa-scr1/depristo/local/scala-2.7.5.final/lib/scala-library.jar

A command that almost works (but you'll need to replace the spaces with colons) is:

#setenv CLASSPATH $CLASSPATH `ls /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/*.jar` /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar:/humgen/gsa-scr1/depristo/local/scala-2.7.5.final/lib/scala-library.jar

3. Building scala code

All of the Scala source code lives in scala/src, which you build using ant scala

There are already some example Scala walkers in scala/src, so doing a standard checkout, installing scala, settting up your environment, should allow you to run something like:

gsa2 ~/dev/GenomeAnalysisTK/trunk > ant scala
Buildfile: build.xml

init.scala:

scala:
     [echo] Sting: Compiling scala!
   [scalac] Compiling 2 source files to /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/scala/classes
   [scalac] warning: there were deprecation warnings; re-run with -deprecation for details
   [scalac] one warning found
   [scalac] Compile suceeded with 1 warning; see the compiler output for details.
   [delete] Deleting: /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar
      [jar] Building jar: /humgen/gsa-scr1/depristo/dev/GenomeAnalysisTK/trunk/dist/GATKScala.jar

4. Invoking a scala walker

Until we can include Scala walkers along with the main GATK jar (avoiding the classpath issue too) you have to invoke your scala walkers using this syntax:

java -Xmx2048m org.broadinstitute.sting.gatk.CommandLineGATK -T BaseTransitionTableCalculator -R /broad/1KG/reference/human_b36_both.fasta -I /broad/1KG/DCC_merged/freeze5/NA12878.pilot2.SLX.bam -l INFO -L 1:1-100

Here, the BaseTransitionTableCalculator walker is written in Scala and being loaded into the system by the GATK walker manager. Otherwise everything looks like a normal GATK module.

Post edited by Geraldine_VdAuwera on
Sign In or Register to comment.