The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

#### ☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Version highlights for GATK version 3.1

Cambridge, MAPosts: 11,743 admin
edited March 2014

This may seem crazy considering we released the big 3.0 version not two weeks ago, but yes, we have a new version for you already! It's a bit of a special case because this release is all about the hardware-based optimizations we had previously announced. What we hadn't announced yet was that this is the fruit of a new collaboration with a team at Intel (which you can read more about here), so we were waiting for everyone to be ready for the big reveal.

### Intel inside GATK

So basically, the story is that we've started collaborating with the Intel Bio Team to enable key parts of the GATK to run more efficiently on certain hardware configurations. For our first project together, we tackled the PairHMM algorithm, which is responsible for a large proportion of the runtime of HaplotypeCaller analyses. The resulting optimizations, which are the main feature in version 3.1, produce significant speedups for HaplotypeCaller runs on a wide range of hardware.

We will continue working with Intel to further improve the performance of GATK tools that have historically been afflicted with performance issues and long runtimes (hello BQSR). As always, we hope these new features will make your life easier, and we welcome your feedback in the forum!

### In practice

Note that these optimizations currently work on Linux systems only, and will not work on Mac or Windows operating systems. In the near future we will add support for Mac OS. We have no plans to add support for Windows since the GATK itself does not run on Windows.

Please note also that to take advantage of these optimizations, you need to opt-in by adding the following flag to your GATK command: -pairHMM VECTOR_LOGLESS_CACHING.

Here is a handy little table of the speedups you can expect depending on the hardware and operating system you are using. The configurations given here are the minimum requirements for benefiting from the expected speedup ranges shown in the third column. Keep in mind that these numbers are based on tests in controlled conditions; in the wild, your mileage may vary.

Linux kernel version Architecture / Processor Expected speedup Instruction set
Any 64-bit Linux Any x86 64-bit 1-1.5x Non-vector
Linux 2.6 or newer Penryn (Core 2 or newer) 1.3-1.8x SSE 4.1
Linux 2.6.30 or newer SandyBridge (i3, i5, i7, Xeon E3, E5, E7 or newer) 2-2.5x AVX

To find out exactly which processor is in your machine, you can run this command in the terminal:

$cat /proc/cpuinfo | grep "model name" model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz  In this example, the machine has 4 cores (8-threads), so you see the answer 8 times. With the model name (here i7-2600) you can look up your hardware's relevant capabilities in the Wikipedia page on vector extensions. Alternatively, Intel has provided us with some links to lists of processors categorized by architecture, in which you can look up your hardware: #### Penryn processors #### Sandy Bridge processors Finally, a few notes to clarify some concepts regarding Linux kernels vs. distributions and processors vs. architectures: • SandyBridge and Penryn are microarchitectures; essentially, these are sets of instructions built into the CPU. Core 2, core i3, i4, i7, Xeon e3, e5, e7 are the processors that will implement a specific architecture to make use of the relevant improvements (see table above). • The Linux kernel has no connection with Linux distribution (e.g. Ubuntu, RedHat etc). Any distribution can use any kernel they want. There are "default kernels" shipped with each distribution, but that's beyond the scope of this article to cover (there are at least 300 Linux distributions out there). But you can always install whatever kernel version you want. • The kernel version 2.6.30 was released in 2009, so we expect every sane person or IT out there to be using something better than this. Geraldine Van der Auwera, PhD Post edited by Geraldine_VdAuwera on Tagged: ## Comments • Posts: 266 ✭✭ edited March 2014 Sweet!!! This is what I have , expecting a 2X speedup! -bash-4.1$ cat /proc/cpuinfo | grep "model name"

model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz
model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz

• Posts: 266 ✭✭
edited March 2014

which GATK commands should -pairHMM VECTOR_LOGLESS_CACHING be added to?

• Cambridge, MAPosts: 11,743 admin

HaplotypeCaller commands. In future we plan to enable other tools to take advantage of hardware optimizations. This is the objective of our budding collaboration with Intel.

Geraldine Van der Auwera, PhD

• Cambridge, UKPosts: 111 ✭✭✭

Could we get a "Instruction Set" and corresponding cpuinfo flags added to that table? It's easier than trying to remember than which Intel Processor came in what order.

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute and Genetic Epidemiology Group - WTSI & Cambridge University

• Cambridge, MAPosts: 11,743 admin

Hah, sure -- that was in the original draft but we removed it because we didn't think people would want to know. But happy to add it back.

Geraldine Van der Auwera, PhD

• Seoul, KoreaPosts: 3

Is the speedup of 2~2.5 on AVX-enabled machine for Haplotypecaller only or for the whole GATK pipeline? According to the poster presented at AGBT, 35X and 720X speedups are expected for haplotypecaller on AVX-enabled Intel Xeon machines with 1-core and 24-cores , respectively. Would you please clarify the situation in a bit detail?

• Cambridge, MAPosts: 11,743 admin

@whiteering‌, the speedups available in 3.1 only affect the HaplotypeCaller. In future we will have speedups for other parts of the pipeline, but it will be a while yet before we can deliver those.

Geraldine Van der Auwera, PhD

• Posts: 266 ✭✭
edited April 2014

I see the following "note" from HC with -pairHMM VECTOR_LOGLESS_CACHING

FTZ enabled - may decrease accuracy if denormal numbers encountered

Using SSE4.1 accelerated implementation of PairHMM

Should users be worried about the "may decrease accuracy" part?

• Cambridge, MAPosts: 11,743 admin

@blueskypy No, you don't need to worry about this at all. It's a leftover development note, will be removed in the next version.

Geraldine Van der Auwera, PhD

• Posts: 13

We don't seem to see significant speed up when running with -pairHMM VECTOR_LOGLESS_CACHING on HaplotypeCaller. We seem to meet the requirements (Xeon CPU E5-2670, AVX, Linux 2.6.32), but the performance actually decreased (89 minutes without the pairHMM flag, 90 minutes with). Is there something else that could keep us from seeing a 2x speed-up?

Below is the edited output of the run with pairHMM just in case you spot something that I should have noticed.

INFO 16:58:49,340 HelpFormatter - Program Args: -T HaplotypeCaller -R /ifs/data/bio/assemblies/H.sapiens/hg19/hg19.fasta -L chr2 --dbsnp data/dbsnp_135.hg19__ReTag.vcf --downsampling_type NONE --annotation AlleleBalanceBySample --annotation ClippingRankSumTest --read_filter BadCigar --num_cpu_threads_per_data_thread 12 --out TEST_CHR2_HaplotypeCaller.vcf -I TEST_group_1_CHR2_indelRealigned_recal.bam -I TEST_group_2_CHR2_indelRealigned_recal.bam -I TEST_group_3_CHR2_indelRealigned_recal.bam -I TEST_group_4_CHR2_indelRealigned_recal.bam -pairHMM VECTOR_LOGLESS_CACHING INFO 16:58:49,342 HelpFormatter - Executing as m@n06 on Linux 2.6.32-431.3.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. INFO 16:58:49,342 HelpFormatter - Date/Time: 2014/04/04 16:58:49 INFO 16:58:49,342 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:58:49,342 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:58:49,783 GenomeAnalysisEngine - Strictness is SILENT INFO 16:58:49,876 GenomeAnalysisEngine - Downsampling Settings: No downsampling INFO 16:58:49,882 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 16:58:49,966 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08 INFO 16:58:50,027 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 16:58:50,327 IntervalUtils - Processing 243199373 bp from intervals INFO 16:58:50,341 MicroScheduler - Running the GATK in parallel mode with 12 total threads, 12 CPU thread(s) for each of 1 data thread(s), of 32 processors available on this machine INFO 16:58:50,478 GenomeAnalysisEngine - Preparing for traversal over 4 BAM files INFO 16:58:51,173 GenomeAnalysisEngine - Done preparing for traversal INFO 16:58:51,173 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 16:58:51,173 ProgressMeter - Location processed.active regions runtime per.1M.active regions completed total.runtime remaining INFO 16:58:51,305 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units WARN 16:58:54,452 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! FTZ enabled - may decrease accuracy if denormal numbers encountered Using AVX accelerated implementation of PairHMM WARN 16:58:54,510 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! . . . INFO 18:29:01,584 ProgressMeter - chr2:243199373 0.00e+00 90.2 m 8945.8 w 100.0% 90.2 m 0.0 s WARN 18:29:05,339 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk! Time spent in setup for JNI call : 0.0 Total compute time in PairHMM computeLikelihoods() : 0.0 INFO 18:29:05,340 HaplotypeCaller - Ran local assembly on 68951 active regions INFO 18:29:05,884 ProgressMeter - done 2.43e+08 90.2 m 22.0 s 100.0% 90.2 m 0.0 s INFO 18:29:05,884 ProgressMeter - Total runtime 5414.71 secs, 90.25 min, 1.50 hours 

• Posts: 255 ✭✭✭

I have this same model as well and also didn't see the speedup, but did see a speed up on older nodes/models that were SSE enabled.

• Posts: 255 ✭✭✭

I'll have to clear it with powers that be tomorrow, but I should be able to provide you with a bam file or more since some of them are hapmap samples. We have an aspera license so it might be faster to download them through that mechanism once we put them up them up on that server.

• Charlestown, MAPosts: 274 admin

can any of you share the dataset you are working on so we can try to reproduce it here? In your logs, it seems like it didn't use the AVX version at all since the "Total compute time in PairHMM computeLikelihoods() : 0.0". Something must be wrong. I'm guessing it may have to do with the fact that this is an AMD machine and we haven't tested the platform identification on AMD (although it's supposed to be standardized...) : "Executing as m@n06 on Linux 2.6.32-431.3.1.el6.x86_64 amd64"

• Posts: 255 ✭✭✭

@Carneiro
I'm waiting for my IT group to put up a few bam files up on our aspera server. In the meantime, I went back through the logs again and found that anytime -nct is used then the log always reflect "Total compute time in PairHMM computeLikelihoods() : 0.0". It appears this is also why that is in @adouble2 log. Anytime -nct is not specified (either when opting in -pairHMM VECTOR_LOGLESS_CACHING or not) that entry now is calculated and in placed into the log. However, I still do not see a speed up on the avx enable cpu models while on the SSE enabled models I did see a 20% increase. Currently in my pipeline I set both -nct and -pairHMM b/c I did not see a decrease in wall clock time and I figured that I might as well set them both if this (-nct + -pairHMM) becomes enabled in 3.2.

Both the SSE and AVX enabled nodes come out as "Executing as ...kernel amd64"

However, for the Xeon CPU E5-2670, I do not get "Using AVX accelerated implementation of PairHMM" like @adouble2 did in the log, but for my older models I do get "Using SSE4.1 accelerated implementation of PairHMM"

• Posts: 255 ✭✭✭

Just sent you an email regarding the files. Well, apparently I had the wrong email address for Mauricio, but I cc'd Geraldine on it.

Best Regards,

Kurt

• Hillsboro, ORPosts: 1
edited April 2014

@Kurt‌

Hi Kurt, Mauricio forwarded me the log messages from your runs. For the SSE run, it seems the library loaded and the vector code executed correctly (SSE_NODE_LOGS/SSE.log). However, for the AVX logs (AVX_NODE_LOGS/200459444@0123857183.HAPLOTYPE.CALLER.AVX.log), the library appears not to have loaded at all. The HaplotypeCaller falls back to Java mode, which is why you do not see any performance improvement (slower compared to SSE).

I would suggest running the HaplotypeCaller with some debug messages printed to see why the library was not loaded. Can you try running it on the AVX system with the additional arguments: "-l DEBUG"? You do not need to run the full HaplotypeCaller, the library is loaded within the first two minutes and will print out information about whether the library was loaded or not. A quick check to see whether the library was loaded is to run:

grep accelerated log_file

Also, the time printed in the log files:

Total compute time in PairHMM computeLikelihoods() : time_in_seconds

is valid only when using a single thread (NO -nct option).

From your logs (NOTHING.log), when only Java is used for PairHMM (no vectorization):

Total compute time in PairHMM computeLikelihoods() : 5752.94

out of a total time of 18222.16. Thus, PairHMM consumed less than one-third of the total time.

For SSE.log

Total compute time in PairHMM computeLikelihoods() : 1722.0774865170001

out of a total time of 14287.14. Although the vectorized PairHMM kernel ran more than 3 times as fast as the Java kernel, the overall speedup is relatively small since the other parts of HaplotypeCaller ran at the same speed as before.

• Posts: 255 ✭✭✭

Sure thing Karthik, @kgururaj‌ , i will let u know sometime this weekend/early next week.

Best,

Kurt

• Posts: 255 ✭✭✭

this is the only thing that I can see so far in regards to the library not being loaded.

DEBUG 07:29:45,885 VectorLoglessPairHMM - libVectorLoglessPairHMM not found in JVM library path - trying to unpack from StingUtils.jar DEBUG 07:29:45,890 PairHMMLikelihoodCalculationEngine\$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING `

• Hillsboro, ORPosts: 1

@Kurt‌
Thanks for the log - the native library failed to load which is why you do not see any speedup.

A couple of checks:

1. Insane check, but just to be on the safe side, see whether the GATK jar file contains the native library:

jar tf target/GenomeAnalysisTK.jar | grep libVector

You should see:

1. At runtime, the AVX library file is unbundled from the GATK jar and loaded. While the HaplotypeCaller is running on the server, do you see a file "/tmp/libVectorLoglessPairHMM*.so" ? This file should be created by Java when it tries to load the library (assuming you have write permissions for the /tmp directory).

The library file should be available after the log file shows the following warning message:

"VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation"

Note that Java deletes the library file when it terminates. So, you should check while the HaplotypeCaller is running and after the warning message listed above is seen in the log.

• Posts: 1

While I'm using VectorLoglessPairHMM in HaplotypeCaller, I could see more than 2X speedup. but In the log file it does always emits
warning like this
WARN 06:18:21,666 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk!

Does it mean that VectorLoglessPariHMM is still under development and there is some possible danger in using this option?

@croshong‌

Hi,

There is nothing to worry about now. The warning will be removed in the next version.

-Sheila

• Posts: 255 ✭✭✭

I have this same model and also didn't see any speed-up, but did on older nodes that had SSE.

• Posts: 255 ✭✭✭

I have this same model and also didn't see any speed-up, but did on older nodes that had SSE.

• Posts: 255 ✭✭✭

I have this same model and also didn't see any speed-up, but did on older nodes that had SSE.

• Posts: 255 ✭✭✭

I have this same model and also didn't see any speed-up, but did on older nodes that had SSE.

• BaselPosts: 3
edited February 10

I tried to run GATK's 3.7 HaplotypeCaller with -pairHMM VECTOR_LOGLESS_CACHING on a Mac OS X 10.11.6 and can't get it work. My running log is attached as a png file. Any ideas what I am doing wrong?