Complete this survey about your research needs and be entered to win an Amazon gift card or FireCloud credit.
Read more about it here!
Download the latest Picard release at https://github.com/broadinstitute/picard/releases.
GATK version 4.beta.6 is out. See the GATK4 beta page for download and details.

Crashes with segmentation fault in shipped `libVectorLoglessPairHMM.so`

Dear GATK people,

Our scientists reported, that GATK 3.6 and 3.7 terminates with a segmentation fault on a lot of Intel systems. Crazily enough, it’s not reproducible on all of them despite being the same model and operating system.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f6c2fa64ce9, pid=4189, tid=0x00007fa8dfc34700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_131-b11) (build 1.8.0_131-b11)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 )
# Problematic frame:
# C  [libVectorLoglessPairHMM9026850853863068944.so+0x1bce9]  LoadTimeInitializer::LoadTimeInitializer()+0x1669
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /scratch/local/joey/workdir/hs_err_pid4189.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

It also happens with older Java versions.

# JRE version: Java(TM) SE Runtime Environment (8.0_25-b17) (build 1.8.0_25-b17)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.25-b02 mixed mode linux-amd64 )

It always terminates at the same address. Here is the top of the error report.

[…]
---------------  T H R E A D  ---------------

Current thread (0x00007fa8d8009800):  JavaThread "main" [_thread_in_native, id=4190, stack(0x00007fa8dfb34000,0x00007fa8dfc35000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f6a2fd91680

Registers:
RAX=0xffffffff80000000, RBX=0x0000000000000000, RCX=0x0000000000000000, RDX=0x0000000000000001
RSP=0x00007fa8dfc319f0, RBP=0x0000000000000000, RSI=0x00007f6c2fddf8a0, RDI=0x0000000000000000
R8 =0x00007f6c2fa90c80, R9 =0x00007f6c2fa90dd0, R10=0x000000000000009c, R11=0x00007f6c2fa69450
R12=0x0000000000000002, R13=0x00007f6c2fd91680, R14=0x00000000000138ff, R15=0x00007f6c2fddf8a4
RIP=0x00007f6c2fa64ce9, EFLAGS=0x0000000000010247, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e
[…]

Here is the backtrace from GDB on the core dump file.

$ gdb java /scratch/local/joey/workdir/core
[…]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f486d6c275a in __GI_abort () at abort.c:89
#2  0x00007f486cfc13b5 in os::abort(bool) () from /usr/local/java/jre/lib/amd64/server/libjvm.so
#3  0x00007f486d163673 in VMError::report_and_die() () from /usr/local/java/jre/lib/amd64/server/libjvm.so
#4  0x00007f486cfc68bf in JVM_handle_linux_signal () from /usr/local/java/jre/lib/amd64/server/libjvm.so
#5  0x00007f486cfbce13 in signalHandler(int, siginfo*, void*) () from /usr/local/java/jre/lib/amd64/server/libjvm.so
#6  <signal handler called>
#7  0x00007f0bd6245ce9 in LoadTimeInitializer::LoadTimeInitializer() () from /project/seqcore-cluster/temp/joseph/libVectorLoglessPairHMM1517996528329513895.so
#8  0x00007f0bd6243487 in __sti__$E () from /project/seqcore-cluster/temp/joseph/libVectorLoglessPairHMM1517996528329513895.so
#9  0x00007f0bd625f126 in __do_global_ctors_aux () from /project/seqcore-cluster/temp/joseph/libVectorLoglessPairHMM1517996528329513895.so
#10 0x00007f0bd62326fb in _init () from /project/seqcore-cluster/temp/joseph/libVectorLoglessPairHMM1517996528329513895.so
[…]

Interestingly, when rebuilding the library with GCC 5.3, and when loading our library with the documented option below, it fixes the issue for us.

-Djava.library.path=/src/gatk/public/VectorPairHMM/src/main/c++

PS: Hopefully, this is the correct forum for such a report. The GitHub repository does not have the issue tracker enabled.

Tagged:

Issue · Github
by Geraldine_VdAuwera

Issue Number
2270
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Hi there, this is indeed the right place for this. We'll ask the engineering team to take a look. In the meantime, could you please test whether this still occurs with the latest nightly build (see downloads page)? There have been some upgrades to that library so the problem might already be fixed.
  • pmenzelpmenzel Member

    @Geraldine_VdAuwera, thank you for the quick reply.

    We will try the nightly version from https://software.broadinstitute.org/gatk/download/nightly.

  • pmenzelpmenzel Member

    There is no crash with the nightly build from July 11th, 2017.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    Great! That will be released as version 3.8 soon. Your scientists can use it safely, it is completely stable.
  • pmenzelpmenzel Member

    I’ll tell them that. Thank you.

  • rfnrfn Aalborg University HospitalMember

    Dear GATK

    I am getting a similar error when running HaplotypeCaller or Mutect2 with GATK 3.8 and openjdk-1.8.0

    The error started happening after switching the reference from b37 to the GDC hg38 reference genome with other reference files from hg38 GATK bundle. Alignment and the rest of the pipeline up until variant calling works fine. I tried switching back to GATK 3.7 and running with nct 1, but got the same error.

    I run Mutect2 with the following parameters:

    java -jar $gatk \
    --analysis_type MuTect2 \
    --reference_sequence $assembly \
    --input_file:normal $rawDataDirNormal/${normalid}_realigned_recal.bam \
    --input_file:tumor $rawDataDirTumor/${tumorid}_realigned_recal.bam \
    --out $outDir/${tumorid}_variants_${interval}.vcf \
    --cosmic $cosmic \
    --dbsnp $known_snp \
    -L $interval \
    --max_alt_alleles_in_normal_count  1000000 \
    --max_alt_allele_in_normal_fraction 0.1 \
    -nct 28
    

    And get the following output:

    INFO  11:43:18,795 HelpFormatter - -----------------------------------------------------------------------------------
    INFO  11:43:18,797 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.7-0-gcfedb67, Compiled 2016/12/12 11:21:18
    INFO  11:43:18,797 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO  11:43:18,797 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
    INFO  11:43:18,797 HelpFormatter - [Wed Oct 04 11:43:18 CEST 2017] Executing on Linux 3.10.0-514.10.2.el7.x86_64 amd64
    INFO  11:43:18,797 HelpFormatter - OpenJDK 64-Bit Server VM 1.8.0_144-b01
    INFO  11:43:18,800 HelpFormatter - Program Args: --analysis_type HaplotypeCaller -nct 28 --reference_sequence /home/projects/au_10001/data/GDC/GRCh38.d1.vd1.fa -I /home/projects/au_10001/data/GeneratedData/0173/0173N1S_realigned_recal.bam -o /home/projects/au_10001/data/GeneratedData/0173/0173N1S_variants.vcf --dbsnp /home/projects/au_10001/data/gatk_bundle_hg38/dbsnp_138.hg38.vcf
    INFO  11:43:18,804 HelpFormatter - Executing as rasbro@risoe-r04-cn089 on Linux 3.10.0-514.10.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_144-b01.
    INFO  11:43:18,805 HelpFormatter - Date/Time: 2017/10/04 11:43:18
    INFO  11:43:18,805 HelpFormatter - -----------------------------------------------------------------------------------
    INFO  11:43:18,805 HelpFormatter - -----------------------------------------------------------------------------------
    INFO  11:43:18,824 GenomeAnalysisEngine - Strictness is SILENT
    INFO  11:43:19,788 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 500
    INFO  11:43:19,795 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    WARNING: BAM index file /home/projects/au_10001/data/GeneratedData/0173/0173N1S_realigned_recal.bai is older than BAM /home/projects/au_10001/data/GeneratedData/0173/0173N1S_realigned_recal.bam
    INFO  11:43:19,887 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.09
    INFO  11:43:19,939 HCMappingQualityFilter - Filtering out reads with MAPQ < 20
    INFO  11:43:20,304 MicroScheduler - Running the GATK in parallel mode with 28 total threads, 28 CPU thread(s) for each of 1 data thread(s), of 28 processors available on this machine
    INFO  11:43:21,095 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO  11:43:22,067 GenomeAnalysisEngine - Done preparing for traversal
    INFO  11:43:22,068 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO  11:43:22,068 ProgressMeter -                 |      processed |    time |         per 1M |           |   total | remaining
    INFO  11:43:22,069 ProgressMeter -        Location | active regions | elapsed | active regions | completed | runtime |   runtime
    INFO  11:43:22,069 HaplotypeCaller - Disabling physical phasing, which is supported only for reference-model confidence output
    INFO  11:43:22,113 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
    WARN  11:43:22,114 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
    INFO  11:43:22,114 StrandBiasTest - SAM/BAM data was found. Attempting to use read data to calculate strand bias annotations values.
    INFO  11:43:22,257 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units
    INFO  11:43:22,258 PairHMM - Performance profiling for PairHMM is disabled because the program is being run with multiple threads (-nct>1) option
    Profiling is enabled only when running in single thread mode
    
    Using AVX accelerated implementation of PairHMM
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00002aabb5d36ce9, pid=8260, tid=0x00002aabb5312700
    #
    # JRE version: OpenJDK Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
    # Java VM: OpenJDK 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  [libVectorLoglessPairHMM5664959178835243957.so+0x1bce9]  LoadTimeInitializer::LoadTimeInitializer()+0x1669
    #
    # Core dump written. Default location: /scratch/18426356.risoe-r04-sn064.cm.cluster/core or core.8260
    #
    # An error report file with more information is saved as:
    # /scratch/18426356.risoe-r04-sn064.cm.cluster/hs_err_pid8260.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    Aborted (core dumped)
    

    Can you help me figure out what is going on here, and why it does not happen with b37?

  • SkyWarriorSkyWarrior TurkeyMember

    OpenJDK is not supported. You need to install OracleJDK.

  • rfnrfn Aalborg University HospitalMember

    I get the same error from OracleJDK

  • SkyWarriorSkyWarrior TurkeyMember

    Segfaults could also be due to miscompiled libraries. Do you have the same error with the precompiled binary?

  • rfnrfn Aalborg University HospitalMember

    So... this is slightly embarassing... :blush:

    I downloaded the precompiled binary and it worked just fine, which led me to discover that for my hg38 pipeline $gatk was set to 3.7 and not in fact 3.8, which I had changed it to in the b37 pipeline. The version number is also evident in the output above. So it seems, I was getting an old error that had already been fixed, and it now works in both Oracle Java and Openjdk.

    Thanks for taking the time to reply.

  • SkyWarriorSkyWarrior TurkeyMember
Sign In or Register to comment.