Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Version highlights for GATK version 3.1

Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,856Administrator, GATK Developer admin
edited March 20 in Announcements

This may seem crazy considering we released the big 3.0 version not two weeks ago, but yes, we have a new version for you already! It's a bit of a special case because this release is all about the hardware-based optimizations we had previously announced. What we hadn't announced yet was that this is the fruit of a new collaboration with a team at Intel (which you can read more about here), so we were waiting for everyone to be ready for the big reveal.


Intel inside GATK

So basically, the story is that we've started collaborating with the Intel Bio Team to enable key parts of the GATK to run more efficiently on certain hardware configurations. For our first project together, we tackled the PairHMM algorithm, which is responsible for a large proportion of the runtime of HaplotypeCaller analyses. The resulting optimizations, which are the main feature in version 3.1, produce significant speedups for HaplotypeCaller runs on a wide range of hardware.

We will continue working with Intel to further improve the performance of GATK tools that have historically been afflicted with performance issues and long runtimes (hello BQSR). As always, we hope these new features will make your life easier, and we welcome your feedback in the forum!

In practice

Note that these optimizations currently work on Linux systems only, and will not work on Mac or Windows operating systems. In the near future we will add support for Mac OS. We have no plans to add support for Windows since the GATK itself does not run on Windows.

Please note also that to take advantage of these optimizations, you need to opt-in by adding the following flag to your GATK command: -pairHMM VECTOR_LOGLESS_CACHING.

Here is a handy little table of the speedups you can expect depending on the hardware and operating system you are using. The configurations given here are the minimum requirements for benefiting from the expected speedup ranges shown in the third column. Keep in mind that these numbers are based on tests in controlled conditions; in the wild, your mileage may vary.

Linux kernel version Architecture / Processor Expected speedup Instruction set
Any 64-bit Linux Any x86 64-bit 1-1.5x Non-vector
Linux 2.6 or newer Penryn (Core 2 or newer) 1.3-1.8x SSE 4.1
Linux 2.6.30 or newer SandyBridge (i3, i5, i7, Xeon E3, E5, E7 or newer) 2-2.5x AVX

To find out exactly which processor is in your machine, you can run this command in the terminal:

$ cat /proc/cpuinfo | grep "model name"                                                                                    
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
model name  : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz

In this example, the machine has 4 cores (8-threads), so you see the answer 8 times. With the model name (here i7-2600) you can look up your hardware's relevant capabilities in the Wikipedia page on vector extensions.

Alternatively, Intel has provided us with some links to lists of processors categorized by architecture, in which you can look up your hardware:

Penryn processors

Sandy Bridge processors

Finally, a few notes to clarify some concepts regarding Linux kernels vs. distributions and processors vs. architectures:

  • SandyBridge and Penryn are microarchitectures; essentially, these are sets of instructions built into the CPU. Core 2, core i3, i4, i7, Xeon e3, e5, e7 are the processors that will implement a specific architecture to make use of the relevant improvements (see table above).

  • The Linux kernel has no connection with Linux distribution (e.g. Ubuntu, RedHat etc). Any distribution can use any kernel they want. There are "default kernels" shipped with each distribution, but that's beyond the scope of this article to cover (there are at least 300 Linux distributions out there). But you can always install whatever kernel version you want.

  • The kernel version 2.6.30 was released in 2009, so we expect every sane person or IT out there to be using something better than this.

Post edited by Geraldine_VdAuwera on

Geraldine Van der Auwera, PhD

Comments

  • blueskypyblueskypy Posts: 191Member
    edited March 19

    Sweet!!! This is what I have :), expecting a 2X speedup!

    -bash-4.1$ cat /proc/cpuinfo | grep "model name"

    model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz model name : Intel(R) Xeon(R) CPU X5672 @ 3.20GHz

    Post edited by blueskypy on
  • blueskypyblueskypy Posts: 191Member
    edited March 19

    which GATK commands should -pairHMM VECTOR_LOGLESS_CACHING be added to?

    Post edited by blueskypy on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,856Administrator, GATK Developer admin

    HaplotypeCaller commands. In future we plan to enable other tools to take advantage of hardware optimizations. This is the objective of our budding collaboration with Intel.

    Geraldine Van der Auwera, PhD

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 64Member

    Could we get a "Instruction Set" and corresponding cpuinfo flags added to that table? It's easier than trying to remember than which Intel Processor came in what order.

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,856Administrator, GATK Developer admin

    Hah, sure -- that was in the original draft but we removed it because we didn't think people would want to know. But happy to add it back.

    Geraldine Van der Auwera, PhD

  • whiteeringwhiteering Seoul, KoreaPosts: 3Member

    Is the speedup of 2~2.5 on AVX-enabled machine for Haplotypecaller only or for the whole GATK pipeline? According to the poster presented at AGBT, 35X and 720X speedups are expected for haplotypecaller on AVX-enabled Intel Xeon machines with 1-core and 24-cores , respectively. Would you please clarify the situation in a bit detail?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,856Administrator, GATK Developer admin

    @whiteering‌, the speedups available in 3.1 only affect the HaplotypeCaller. In future we will have speedups for other parts of the pipeline, but it will be a while yet before we can deliver those.

    Geraldine Van der Auwera, PhD

  • blueskypyblueskypy Posts: 191Member
    edited April 6

    I see the following "note" from HC with -pairHMM VECTOR_LOGLESS_CACHING

    FTZ enabled - may decrease accuracy if denormal numbers encountered

    Using SSE4.1 accelerated implementation of PairHMM

    Should users be worried about the "may decrease accuracy" part?

    Post edited by blueskypy on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,856Administrator, GATK Developer admin

    @blueskypy No, you don't need to worry about this at all. It's a leftover development note, will be removed in the next version.

    Geraldine Van der Auwera, PhD

  • adouble2adouble2 Posts: 10Member

    We don't seem to see significant speed up when running with -pairHMM VECTOR_LOGLESS_CACHING on HaplotypeCaller. We seem to meet the requirements (Xeon CPU E5-2670, AVX, Linux 2.6.32), but the performance actually decreased (89 minutes without the pairHMM flag, 90 minutes with). Is there something else that could keep us from seeing a 2x speed-up?

    Below is the edited output of the run with pairHMM just in case you spot something that I should have noticed.

    INFO  16:58:49,340 HelpFormatter - Program Args: -T HaplotypeCaller -R /ifs/data/bio/assemblies/H.sapiens/hg19/hg19.fasta -L chr2 --dbsnp data/dbsnp_135.hg19__ReTag.vcf --downsampling_type NONE --annotation AlleleBalanceBySample --annotation ClippingRankSumTest --read_filter BadCigar --num_cpu_threads_per_data_thread 12 --out TEST_CHR2_HaplotypeCaller.vcf -I TEST_group_1_CHR2_indelRealigned_recal.bam -I TEST_group_2_CHR2_indelRealigned_recal.bam -I TEST_group_3_CHR2_indelRealigned_recal.bam -I TEST_group_4_CHR2_indelRealigned_recal.bam -pairHMM VECTOR_LOGLESS_CACHING
    INFO  16:58:49,342 HelpFormatter - Executing as m@n06 on Linux 2.6.32-431.3.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15.
    INFO  16:58:49,342 HelpFormatter - Date/Time: 2014/04/04 16:58:49
    INFO  16:58:49,342 HelpFormatter - --------------------------------------------------------------------------------
    INFO  16:58:49,342 HelpFormatter - --------------------------------------------------------------------------------
    INFO  16:58:49,783 GenomeAnalysisEngine - Strictness is SILENT
    INFO  16:58:49,876 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO  16:58:49,882 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO  16:58:49,966 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.08
    INFO  16:58:50,027 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
    INFO  16:58:50,327 IntervalUtils - Processing 243199373 bp from intervals
    INFO  16:58:50,341 MicroScheduler - Running the GATK in parallel mode with 12 total threads, 12 CPU thread(s) for each of 1 data thread(s), of 32 processors available on this machine
    INFO  16:58:50,478 GenomeAnalysisEngine - Preparing for traversal over 4 BAM files
    INFO  16:58:51,173 GenomeAnalysisEngine - Done preparing for traversal
    INFO  16:58:51,173 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO  16:58:51,173 ProgressMeter -        Location processed.active regions  runtime per.1M.active regions completed total.runtime remaining
    INFO  16:58:51,305 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
    WARN  16:58:54,452 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk!
    FTZ enabled - may decrease accuracy if denormal numbers encountered
    Using AVX accelerated implementation of PairHMM
    WARN  16:58:54,510 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk!
    .
    .
    .
    INFO  18:29:01,584 ProgressMeter -  chr2:243199373        0.00e+00   90.2 m     8945.8 w    100.0%        90.2 m     0.0 s
    WARN  18:29:05,339 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk!
    Time spent in setup for JNI call : 0.0
    Total compute time in PairHMM computeLikelihoods() : 0.0
    INFO  18:29:05,340 HaplotypeCaller - Ran local assembly on 68951 active regions
    INFO  18:29:05,884 ProgressMeter -            done        2.43e+08   90.2 m       22.0 s    100.0%        90.2 m     0.0 s
    INFO  18:29:05,884 ProgressMeter - Total runtime 5414.71 secs, 90.25 min, 1.50 hours
  • KurtKurt Posts: 126Member ✭✭✭

    I have this same model as well and also didn't see the speedup, but did see a speed up on older nodes/models that were SSE enabled.

  • KurtKurt Posts: 126Member ✭✭✭

    I'll have to clear it with powers that be tomorrow, but I should be able to provide you with a bam file or more since some of them are hapmap samples. We have an aspera license so it might be faster to download them through that mechanism once we put them up them up on that server.

  • CarneiroCarneiro Posts: 274Administrator, GATK Developer admin

    can any of you share the dataset you are working on so we can try to reproduce it here? In your logs, it seems like it didn't use the AVX version at all since the "Total compute time in PairHMM computeLikelihoods() : 0.0". Something must be wrong. I'm guessing it may have to do with the fact that this is an AMD machine and we haven't tested the platform identification on AMD (although it's supposed to be standardized...) : "Executing as m@n06 on Linux 2.6.32-431.3.1.el6.x86_64 amd64"

  • KurtKurt Posts: 126Member ✭✭✭

    @Carneiro I'm waiting for my IT group to put up a few bam files up on our aspera server. In the meantime, I went back through the logs again and found that anytime -nct is used then the log always reflect "Total compute time in PairHMM computeLikelihoods() : 0.0". It appears this is also why that is in @adouble2 log. Anytime -nct is not specified (either when opting in -pairHMM VECTOR_LOGLESS_CACHING or not) that entry now is calculated and in placed into the log. However, I still do not see a speed up on the avx enable cpu models while on the SSE enabled models I did see a 20% increase. Currently in my pipeline I set both -nct and -pairHMM b/c I did not see a decrease in wall clock time and I figured that I might as well set them both if this (-nct + -pairHMM) becomes enabled in 3.2.

    Both the SSE and AVX enabled nodes come out as "Executing as ...kernel amd64"

    However, for the Xeon CPU E5-2670, I do not get "Using AVX accelerated implementation of PairHMM" like @adouble2 did in the log, but for my older models I do get "Using SSE4.1 accelerated implementation of PairHMM"

  • KurtKurt Posts: 126Member ✭✭✭

    @Carneiro‌, @Geraldine_VdAuwera,

    Just sent you an email regarding the files. Well, apparently I had the wrong email address for Mauricio, but I cc'd Geraldine on it.

    Best Regards,

    Kurt

  • kgururajkgururaj Hillsboro, ORPosts: 1Member
    edited April 11

    @Kurt

    Hi Kurt, Mauricio forwarded me the log messages from your runs. For the SSE run, it seems the library loaded and the vector code executed correctly (SSE_NODE_LOGS/SSE.log). However, for the AVX logs (AVX_NODE_LOGS/200459444@0123857183.HAPLOTYPE.CALLER.AVX.log), the library appears not to have loaded at all. The HaplotypeCaller falls back to Java mode, which is why you do not see any performance improvement (slower compared to SSE).

    I would suggest running the HaplotypeCaller with some debug messages printed to see why the library was not loaded. Can you try running it on the AVX system with the additional arguments: "-l DEBUG"? You do not need to run the full HaplotypeCaller, the library is loaded within the first two minutes and will print out information about whether the library was loaded or not. A quick check to see whether the library was loaded is to run:

    grep accelerated log_file

    Also, the time printed in the log files:

    Total compute time in PairHMM computeLikelihoods() : time_in_seconds

    is valid only when using a single thread (NO -nct option).

    From your logs (NOTHING.log), when only Java is used for PairHMM (no vectorization):

    Total compute time in PairHMM computeLikelihoods() : 5752.94

    out of a total time of 18222.16. Thus, PairHMM consumed less than one-third of the total time.

    For SSE.log

    Total compute time in PairHMM computeLikelihoods() : 1722.0774865170001

    out of a total time of 14287.14. Although the vectorized PairHMM kernel ran more than 3 times as fast as the Java kernel, the overall speedup is relatively small since the other parts of HaplotypeCaller ran at the same speed as before.

    Post edited by kgururaj on
  • KurtKurt Posts: 126Member ✭✭✭

    Sure thing Karthik, @kgururaj‌ , i will let u know sometime this weekend/early next week.

    Best,

    Kurt

  • KurtKurt Posts: 126Member ✭✭✭

    @kgururaj‌,

    this is the only thing that I can see so far in regards to the library not being loaded.

    DEBUG 07:29:45,885 VectorLoglessPairHMM - libVectorLoglessPairHMM not found in JVM library path - trying to unpack from StingUtils.jar 
    DEBUG 07:29:45,890 PairHMMLikelihoodCalculationEngine$1 - Failed to load native library for VectorLoglessPairHMM - using Java implementation of LOGLESS_CACHING
  • kgururajkgururaj Hillsboro, ORPosts: 1Member

    @Kurt‌ Thanks for the log - the native library failed to load which is why you do not see any speedup.

    A couple of checks:

    1. Insane check, but just to be on the safe side, see whether the GATK jar file contains the native library:

    jar tf target/GenomeAnalysisTK.jar | grep libVector

    You should see:

    org/broadinstitute/sting/utils/pairhmm/libVectorLoglessPairHMM.so

    1. At runtime, the AVX library file is unbundled from the GATK jar and loaded. While the HaplotypeCaller is running on the server, do you see a file "/tmp/libVectorLoglessPairHMM*.so" ? This file should be created by Java when it tries to load the library (assuming you have write permissions for the /tmp directory).

    The library file should be available after the log file shows the following warning message:

    "VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation"

    Note that Java deletes the library file when it terminates. So, you should check while the HaplotypeCaller is running and after the warning message listed above is seen in the log.

  • croshongcroshong Posts: 1Member

    While I'm using VectorLoglessPairHMM in HaplotypeCaller, I could see more than 2X speedup. but In the log file it does always emits warning like this WARN 06:18:21,666 VectorLoglessPairHMM - WARNING: the VectorLoglessPairHMM is an experimental implementation still under active development. Use at your own risk!

    Does it mean that VectorLoglessPariHMM is still under development and there is some possible danger in using this option?

  • SheilaSheila Broad InstitutePosts: 279Member, GATK Developer, Broadie, Moderator admin

    @croshong

    Hi,

    There is nothing to worry about now. The warning will be removed in the next version.

    -Sheila

Sign In or Register to comment.