Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

GATK 3.8-0 PrintReads fatal error

fangpingmufangpingmu Pittsburgh, PAMember
edited October 2017 in Ask the GATK team

Hello,

Could you please help me to figure out this fatal error in running PrintReads?

After I updated GATK to version 3.8-0. I kept getting this fatal error in running PrintReads. I can skip this step and run HaplotypeCaller with -BQSR option.

parsing sample: SRR098333


Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie admin

    @fangpingmu
    Hi,

    Can you check if this happens without -nct 8? Also, what does hs_err_pid1932.log say?

    Thanks,
    Sheila

  • fangpingmufangpingmu Pittsburgh, PAMember
    edited October 2017

    This still happens without -nct 8.

    INFO  14:03:32,459 ProgressMeter -      2:55272600   2.6137418E7    25.5 m      58.0 s        9.8%     4.3 h       3.9 h
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f9eb2d4de8b, pid=28451, tid=0x00007f9eb0bc9700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 )
    # Problematic frame:
    # V  [libjvm.so+0x64be8b]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    ...
    
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    #
    /var/spool/slurmd/job466941/slurm_script: line 18: 28451 Aborted                 java -Xms16g -Xmx200g -XX:-UseParallelGC -Djava.io.tmpdir=/alignments/tmp -jar /home/apps/GATK/GenomeAnalysisTK-3.8.0/GenomeAnalysisTK.jar -T PrintReads -R /refs/GATK_Resource_Bundle/b37/human_g1k_v37.fasta -BQSR SRR098333.recal_data.table -I SRR098333.bwa_mem.sorted_dups_removed_indelrealigner.bam -o SRR098333.bwa_mem.sorted_dups_removed_indelrealigner_BQSR.bam
    
    
    The hs_err_pid***.log is very long and it looks like the following.
    
    
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007fb1f97fae8b, pid=9599, tid=0x00007f7fba24c700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 )
    # Problematic frame:
    # V  [libjvm.so+0x64be8b]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
    
    ...
    
    Stack: [0x00007f7fba14c000,0x00007f7fba24d000],  sp=0x00007f7fba24b7a0,  free space=1021k
    Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
    V  [libjvm.so+0x64be8b]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
    V  [libjvm.so+0x97fab2]  ParCompactionManager::follow_marking_stacks()+0x232
    V  [libjvm.so+0x965915]  StealMarkingTask::do_it(GCTaskManager*, unsigned int)+0x325
    V  [libjvm.so+0x5daecf]  GCTaskThread::run()+0x12f
    V  [libjvm.so+0x92a338]  java_start(Thread*)+0x108
    
    ...
    

    I can confirm that this problem appears when GATK-3.8.0 or above is used. If I change the above /home/apps/GATK/GenomeAnalysisTK-3.8.0/GenomeAnalysisTK.jar to GATK 3.7.0 GenomeAnalysisTK.jar, the command runs fine.

    The raw reads are public data. You should be able to reproduce the errors. The samples are from SRA SRR098333 - SRR098338. I download them using this command:

    fastq-dump --split-3 --qual-filter-1 SRR098333

    I also try ReadsPipelineSpark within GATK 4.beta.5. 2 out of the 8 samples give similar error. SRR098338 is one. I also realize that GATK version above 3.8.0 requires significantly more memory for PrintReads or ApplyRecalibration step.

    Post edited by Geraldine_VdAuwera on
  • SheilaSheila Broad InstituteMember, Broadie admin

    @fangpingmu
    Hi,

    What happens if you add ulimit -c unlimited as suggested in the error message to your command?

    -Sheila

  • fangpingmufangpingmu Pittsburgh, PAMember

    This will write the core dump. In my original post, "ulimit -c unlimited" is added so that "Core dump written. Default location: core or core.1932".

  • SheilaSheila Broad InstituteMember, Broadie admin

    @fangpingmu
    Hi,

    Okay. Can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

  • fangpingmufangpingmu Pittsburgh, PAMember

    GATK-3.8.0-PrintReads.zip

    Issue · Github
    by Sheila

    Issue Number
    2564
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    sooheelee
  • krdavkrdav Member, Broadie

    Hmm, I just got a very similar fatal error running MuTect2 from the GATK4-5 version. Tried using different Java versions and no difference. Also I noticed that the error is not consistently showing at the same place, sometimes it can run for 30 min, other times 2 hours, before eventually crashing.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Inconsistencies indicate a general system instability I guess. What kind of system specs do you have?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @fangpingmu
    Hi,

    I will take a look soon.

    Thanks,
    Sheila

  • SheilaSheila Broad InstituteMember, Broadie admin

    @fangpingmu
    Hi,

    I see you just provided the log output file. I need a snippet of a BAM file that I can reproduce the error with. Please see the instructions I linked to above for more information.

    Thanks,
    Sheila

    P.S. @krdav If you can submit a bug report , that would be great too.

  • fangpingmufangpingmu Pittsburgh, PAMember

    I reloaded the bug report, GATK-3.8.0-PrintReads-crash-1.zip. For this bam file and recal_data.table, the error is consistently showing at the same place.

  • RyanRyan Member
    edited October 2017

    I had a similar error I believe using Picard tool version 2.13.2 with java verion 8.0_144-b01 and when trying to run the MarkDuplicates tool. I get the following error:

    `# A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007ff9986b9e8b, pid=3288, tid=0x00007ff996906700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_144-b01) (build 1.8.0_144-b01)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.144-b01 mixed mode linux-amd64 )
    # Problematic frame:
    # V  [libjvm.so+0x64be8b]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
    `
    

    I was able to move forward by reducing the java option from -Xmx32G to -Xmx12G and had no issue with generating the bam file after that. Not sure if that would work for you to.

    Post edited by Geraldine_VdAuwera on
  • SheilaSheila Broad InstituteMember, Broadie admin

    @fangpingmu
    Hi,

    I am testing the files now. I may need you to re-submit, as I get an "EOF marker is missing" error for the BAM file. How did you make the BAM file you sent over?

    Thanks,
    Sheila

  • fangpingmufangpingmu Pittsburgh, PAMember

    Within the zip file, you can find a file named command, which lists the detailed commands to generate the bam file. It takes several hours to go to this PrintReads step. Let me know whether you need me to re-submit the bam file.

    At picard step, I did use java options -Xms5g -Xmx16g.

  • bgrenierbgrenier FranceMember
    edited October 2017

    Hi,

    I had the same error ("InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b") using GATK-3.8 GenotypeGVCFs on 4 combined batches of WGS splitted by chromosome. Each chromosome failed with this error. I tried several time and each time it failed with this error. This seemed to be random according to the log (but using -nt 8 so log files are likely untrustworthy).

    I then tried to follow Ryan suggestion by lowering java memory option from -Xmx50g to -Xmx22g (still using -nt 8) and now it seems to work : chromosome 8 to 22 worked fine and the others are still running.

  • SheilaSheila Broad InstituteMember, Broadie admin
    edited October 2017

    @bgrenier @Ryan
    Hi,

    Thanks for reporting your solution.

    @fangpingmu Can you try what the others posted and see if that fixes your issue? If it does not, I will need you to re-submit the BAM file, as I am getting an EOF marker missing error for the BAM file.

    Thanks,
    Sheila

  • ericco92ericco92 Cambridge, UKMember ✭✭

    I also had the same issue with the same solution - reducing the total VM memory to 16gb with -Xmx<>g seems to have allowed things to run, even with multiple cores. I think I'm on Java 1.7, but I saw the same behavior on 1.8.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @ericco92
    Hi,

    Great, thanks for letting us know. GATK4 only supports Java 1.8. Java 1.7 may run without errors, but things could be failing silently. It is best to use 1.8.

    -Sheila

  • fangpingmufangpingmu Pittsburgh, PAMember

    I performed trial-and-error. When I reduced the memory requirement to -Xms1g -Xmx5g and GATK 3.8.0 PrintReads runs OK for this example.

  • SheilaSheila Broad InstituteMember, Broadie admin

    @fangpingmu
    Hi,

    Great, thanks for confirming. I am checking with the team why this may solve the issue.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Since it also affects Picard I suspect the problem might be due to the GKL, which was patched quite recently. Can anyone confirm if this still happens with the very latest nightly build?

  • alexssonalexsson Member
    edited November 2017

    I have exactly the same problem!!!! We are using gatk on a production server, any way we can patch gatk 3.8 without downloading nightly builds?

  • alexssonalexsson Member
    edited November 2017

    I can confirm that the error is still there with the nightly build (nightly-2017-11-04-g45c474f).

    I also looked at the log showing that pipereads is using -Xmx 91000m. It does not respect my Xmx settings in the global config file (Im using bcbio-nextgen) which is set to -Xmx 7000m. I'm not sure how to reduce this Xmx setting via the config file in bcbio-nextgen (1.0.5), why does it not respect the value in the config file (bcbio_system.yaml)?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @alexsson
    Hi,

    Have you tested this just running on your computer without bcbio-nextgen? If you confirm this error still happens on your computer, I may need you to submit a bug report.

    -Sheila

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited November 2017

    I am unfortunately joining the choir. I get this error message with 3.8, 2017-10-06-g1994025 and 2017-11-07-g45c474f when running GenotypeGVCFs with --num_threads greater than 1 (haven't tried with 1) with jre1.8.0_74 and jre1.8.0_60:

    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00002b08914fc9ab, pid=23056, tid=47383404685056
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_74-b02) (build 1.8.0_74-b02)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.74-b02 mixed mode linux-amd64 )
    # Problematic frame:
    # V  [libjvm.so+0x64c9ab]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    
  • I'm afraid I'm also having the same issue with GATK v3.8-0-ge9d806836, compiled 2017/07/28 21:26:50. GATK is used in two steps, one using BaseRecalibrator (works fine) and one for PrintReads (fails). Both were originally run with -Xmx128G and -nct 72. I've managed to get PrintReads to work if I set the maximum memory to 16GB or less. There may be a higher maximum, but I've not had time to increment enough, although I know 32GB causes PrintReads to fail. I also change max threads to 36 instead of 72, and it still failed at 32GB, so number of threads is either not important, or doesn't have as big of an impact. I've also used both OpenJDK and Oracle JDK, both giving the same issue.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    It also fails with -nt 1 and -nct 1.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    With GenotypeGVCFs3.8 I lowered -nt and -Xmx from 24 and 64GB to 8 and 16GB, respectively. That seemed to do the trick.

    @bgrenier said:
    I then tried to follow Ryan suggestion by lowering java memory option from -Xmx50g to -Xmx22g (still using -nt 8) and now it seems to work : chromosome 8 to 22 worked fine and the others are still running.

    @Ryan said:
    I had a similar error I believe using Picard tool version 2.13.2 with java verion 8.0_144-b01 and when trying to run the MarkDuplicates tool.

    I was able to move forward by reducing the java option from -Xmx32G to -Xmx12G and had no issue with generating the bam file after that. Not sure if that would work for you to.

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭
    edited November 2017

    Thanks @shlee ! I'll try that. The documentation reads as follows by the way:
    "IntelDeflater (the new default in GATK version 3.8) and the JDK Deflater (the previous GATK default)"
    "IntelInflater (the new default in GATK version 3.8) and the JDK Inflater (the previous GATK default)"

    So which one was the previous default? Maybe there should just be one flag? Thanks again!!

    Hopefully this also solves my problem of a process using more CPU than I specify with -nt and -nct; a problem I did not have previously (i.e. 3.4). On our cluster it leads a job to be killed, if you use more cores than you requested.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    JDK was the previous default for both deflator and inflator. I suppose there are situations in which mixing JDK with Intel for deflations vs. inflation might be desirable.

  • Setting the jdk_inflater/deflater flags seems to fix my issues, so far. I will get in touch if anything changes. Cheers!

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    @djwhiteepcc said:
    Setting the jdk_inflater/deflater flags seems to fix my issues, so far. I will get in touch if anything changes. Cheers!

    I concur. I haven't experienced any issues after I made this change. Thanks.

  • I am running GATK nightly version nightly-2017-11-05-g45c474f because the supposedly stable 3.8-0 was running into errors during base recalibration. Now I am running print reads like below:

    java -Xmx64G -jar "$path_programs_gatk" \
    -T PrintReads \
    -nt 1 \
    -nct 8 \
    -R "$path_dr_genome" \
    -BQSR "$prefix-recal-data.txt" \
    -I $1 \
    -o "$prefix-recal.bam"
    

    And I get this error:

    ...
    INFO  11:22:11,392 ReadShardBalancer$1 - Loading BAM index data
    INFO  11:22:11,393 ReadShardBalancer$1 - Done loading BAM index data
    INFO  11:22:41,371 ProgressMeter -       1:1613138    200002.0    30.0 s       2.5 m        0.1%     7.1 h       7.1 h
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00002b0147ed5aeb, pid=17344, tid=0x00002b116cb58700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build 1.8.0_92-b14)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode linux-amd64 )
    # Problematic frame:
    # V  [libjvm.so+0x64eaeb]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
    #
    # Core dump written. Default location: /pica/v9/b2016094_nobackup/ngs/work-data/12-drg-f1/mapping/bwa/core or core.17344 (max size 10 kB). To ensure a full core dump, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /pica/v9/b2016094_nobackup/ngs/work-data/12-drg-f1/mapping/bwa/hs_err_pid17344.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    #
    /var/spool/slurmd/job11482470/slurm_script: line 43: 17344 Aborted                 (core dumped) java -Xmx64G -jar "$path_programs_gatk" -T PrintReads -nt 1 -nct 8 -R "$path_dr_genome" -BQSR "$prefix-recal-data.txt" -I $1 -o "/scratch/$SLURM_JOB_ID/$prefix-recal.bam"
    End of Script. Script took 45 seconds.
    [[email protected] bwa]$ cat hs_err_pid17344.log
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00002b0147ed5aeb, pid=17344, tid=0x00002b116cb58700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build 1.8.0_92-b14)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode linux-amd64 )
    # Problematic frame:
    # V  [libjvm.so+0x64eaeb]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
    #
    # Core dump written. Default location: /pica/v9/b2016094_nobackup/ngs/work-data/12-drg-f1/mapping/bwa/core or core.17344 (max size 10 kB). To ensure a full core dump, try "ulimit -c unlimited" before starting Java again
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    #
    

    I lowered cores from 8 to 2 and RAM from 64GB to 16GB (as suggested here). Changed nct from 8 to 2. Now it seems to running alright. The estimated completion time remained more or less the same (7.1 hours vs 8 hours) in both cases (8 cores vs 2 cores). That is a bit strange too.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rmf,

    If you are optimizing run times for your setup that uses GATK3.8, then perhaps you would be interested in testing additionally with -jdk_deflater and -jdk_inflater flags.

  • buddejbuddej St. LouisMember

    I encountered the same error with picard MarkDuplicates as @Geraldine_VdAuwera mentioned above, so maybe this does have to do with GKL. I tested with 2.14.0, 2.14.1, and a recently nightly picard-2.14.0-7-g28a441a, all using Sun's jre1.8.0.151

    MarkDuplicates succeeds with -Xmx24g, -Xmx30g or -Xmx31g
    MarkDuplicates fails with -Xmx32g (or -Xmx33g even)

    Strangely, the .bam that is output is 99.99+% complete (849903407 / 849905490 reads, in my large test case) at the point when picard fails.

    The option USE_JDK_INFLATER=TRUE does not affect anything -- picard still fails with the same error
    Adding the option USE_JDK_DEFLATER=TRUE results in a successful run with -Xmx32g (or larger)

    The output .bam files are identical when viewed, and only about 1% larger on average. Runtime was 10% longer with the JDK deflater, though.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    I am observing the same error with other GATK tools as well (version 3.8). -Xmx more than 31G always fails no matter what you do with GATK. Could be an issue about GKL.

  • rmfrmf Member
    edited November 2017

    Related to my previous comment. I lowered RAM from 64GB to 16GB and got it to run. But then at some point it crashed due to out of memory error.

    This is difficult to work with. It's either too much RAM or too little RAM. Is there a way around?

    Can I use the printReads function from another version of GATK? This sounds like a bad idea. But, I am trying to avoid having to run the whole pipeline using an older version of GATK. In hindsight, I should've done that.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭
    edited November 2017

    @buddej, @SkyWarrior, @rmf et al.,

    Can you tell us more about the systems (hardware, os etc) you are running these on? It's my understanding the GKL is meant to accelerate analyses only on specific hardware.

    Also @rmf, can you post your exact PrintReads command? It's not clear which inflator/deflator you are using.

    Our production uses different versions of GATK per analysis task. See this reference implementation for an example. If you know a particular version of a tool works as expected for your aims, i.e. it is validated by you, then there is little reason to upgrade versions for a tool that has not undergone other improvements in the versions besides the convenience of using a single version. It's my understanding there are few tool-specific changes between v3.8 and v3.7. Release notes are at https://software.broadinstitute.org/gatk/blog?id=10063.

    I will see if the GKL folks have any additional insight.

    P.S. GKL question is posted at https://github.com/Intel-HLS/GKL/issues/81.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    I am running GATK3.8 on an 128gb ram Intel Skylake-X system with ubuntu 4.13 kernel and Oracle Java 1.8.0_151. There is nothing fancy on the configuration or the OS so everything is pretty much vanilla.

    regardless of the java version I experienced this segfaulting whenever my Xmx is 32G or higher.

    For the sake of comparison I can try an earlier version like 3.7 and also 4.beta6 to see if segfaulting still persists. I can give a definitive feedback on monday.

  • rmfrmf Member
    edited November 2017

    I am running this version of GATK ((GATK) vnightly-2017-11-05-g45c474f). I am using a computing cluster running Scientific Linux Red Hat 4.4.7-18. This is what I could find. The cores are dual CPU (Intel Xeon E5-2660) with 8 GB RAM per core. JAVA version is sun_jdk1.8.0_92. I think 32GB RAM is the threshold beyond which the errors start. But, depending on the BAM file, I get out of memory errors at lower RAM 8GB/16GB/24GB. I don't know anything about this inflator/deflator. My code looks like:

    This works:

    prefix="${1/.bam/}"
    java -Xmx24G -jar "$path_programs_gatk" \
    -T PrintReads \
    -nt 1 \
    -nct 3 \
    -R "$path_genome" \
    -BQSR "$prefix-recal-data.txt" \
    -I $1 \
    -o "$prefix-recal.bam"
    

    This does not work:

    prefix="${1/.bam/}"
    java -Xmx32G -jar "$path_programs_gatk" \
    -T PrintReads \
    -nt 1 \
    -nct 4 \
    -R "$path_genome" \
    -BQSR "$prefix-recal-data.txt" \
    -I $1 \
    -o "$prefix-recal.bam"
    

    Produces the error below:

    INFO  21:07:27,439 HelpFormatter - ---------------------------------------------------------------------------------------------
    INFO  21:07:27,442 HelpFormatter - The Genome Analysis Toolkit (GATK) vnightly-2017-11-05-g45c474f, Compiled 2017/11/05 00:01:14
    INFO  21:07:27,442 HelpFormatter - Copyright (c) 2010-2016 The Broad Institute
    INFO  21:07:27,442 HelpFormatter - For support and documentation go to https://software.broadinstitute.org/gatk
    INFO  21:07:27,442 HelpFormatter - [Sat Nov 11 21:07:27 CET 2017] Executing on Linux 2.6.32-696.13.2.el6.x86_64 amd64
    INFO  21:07:27,442 HelpFormatter - Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14
    INFO  21:07:27,446 HelpFormatter - Program Args: -T PrintReads -nt 1 -nct 4 -R /home/royfranc/ngs/common-data/ensembl/danio_rerio/87/Danio_rerio.GRCz10.dna.toplevel.fa -BQSR c20-02-bwa-recal-data.txt -I c20-02-bwa.bam -o /scratch/11502981/c20-02-bwa-recal.bam
    INFO  21:07:27,455 HelpFormatter - Executing as [email protected] on Linux 2.6.32-696.13.2.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14.
    INFO  21:07:27,456 HelpFormatter - Date/Time: 2017/11/11 21:07:27
    INFO  21:07:27,456 HelpFormatter - ---------------------------------------------------------------------------------------------
    INFO  21:07:27,456 HelpFormatter - ---------------------------------------------------------------------------------------------
    INFO  21:07:27,510 NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/pica/v9/b2016094_nobackup/ngs/programs/gatk/GenomeAnalysisTK-nightly-2017-11-05-g45c474f/GenomeAnalysisTK.jar!/com/intel/gkl/native/libgkl_compression.so
    INFO  21:07:27,525 GenomeAnalysisEngine - Deflater: IntelDeflater
    INFO  21:07:27,525 GenomeAnalysisEngine - Inflater: IntelInflater
    INFO  21:07:27,527 GenomeAnalysisEngine - Strictness is SILENT
    INFO  21:07:28,275 ContextCovariate -           Context sizes: base substitution model 2, indel substitution model 3
    INFO  21:07:28,345 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO  21:07:28,363 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO  21:07:28,563 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.20
    INFO  21:07:28,651 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 4 CPU thread(s) for each of 1 data thread(s), of 16 processors available on this machine
    INFO  21:07:29,009 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO  21:07:29,013 GenomeAnalysisEngine - Done preparing for traversal
    INFO  21:07:29,015 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO  21:07:29,015 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
    INFO  21:07:29,015 ProgressMeter -        Location |     reads | elapsed |     reads | completed | runtime |   runtime
    INFO  21:07:29,045 ReadShardBalancer$1 - Loading BAM index data
    INFO  21:07:29,066 ReadShardBalancer$1 - Done loading BAM index data
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00002b59c8e3aaeb, pid=47277, tid=0x00002b61e5fac700
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_92-b14) (build 1.8.0_92-b14)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode linux-amd64 )
    # Problematic frame:
    # V  [libjvm.so+0x64eaeb]  InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b
    #
    # Core dump written. Default location: /pica/v9/b2016094_nobackup/ngs/work-data/12-drg-f1/mapping/bwa/core or core.47277 (max size 10 kB). To ensure a full core dump, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /pica/v9/b2016094_nobackup/ngs/work-data/12-drg-f1/mapping/bwa/hs_err_pid47277.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    #
    /var/spool/slurmd/job11502981/slurm_script: line 43: 47277 Aborted                 (core dumped) java -Xmx32G -jar "$path_programs_gatk" -T PrintReads -nt 1 -nct 4 -R "$path_dr_genome" -BQSR "$prefix-recal-data.txt" -I $1 -o "/scratch/$SLURM_JOB_ID/$prefix-recal.bam"
    End of Script. Script took 54 seconds.
    
    
    Post edited by rmf on
  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rmf,

    Here are some options to consider. Some of these come from other users who have stated some of these change erring runs to runs that finish.

    [1] Limiting garbage collection's memory use with -XX:+UseSerialGC. It's my understanding that without this parameter, garbage collection will take as much memory as you have available.
    [2] Using the gatk-launch script to invoke the jar. This sets a number of options on your behalf for optimal runs.
    [3] Switching v3.8's new default GKL inflator/deflator to the previous default's JDK inflator/deflator by adding -jdk_deflater and -jdk_inflater to the GATK tool command. There are similar options for Picard. One user stated only switching back to the JDK deflator was necessary for their run to succeed and the inflator did not matter.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    My current findings:
    All tried with GenotypeGVCFs tool. Java 1.8.0_151 Ubuntu 17.10 Kernel 4.13 and 4.14 (Vanilla builds no gimmicks, GATK 3.8)

    all gvcfs were generated inaccordance to GATK best practices with HaplotypeCaller -ERC GVCF and GATK3.8 or GATK3.7

    1- -Xmx24G 133 samples with -nt1 intelflaters -- Runs fine and completes in 7 hours
    2- -Xmx32G 133 samples with -nt 1 intelflaters -- SEGFAULTS in 5 to 10 minutes
    3- -Xmx48G 133 samples with -nt 4 intelflaters -- SEGFAULTS in 5 to 10 minutes
    4- -Xmx32G 133 samples with -nt 8 jdkflaters -- Runs fine and completes in 1 hour
    5- -Xmx96G 133 samples with -nt 16 jdkflaters -- Runs fine and completes in 54 minutes
    6- -Xmx48G 133 samples with -nt 8 intelflaters -XX:+UseSerialGC -- SEGFAULTS in 5 to 10 minutes
    7- Xmx32G 133 samples with -nt 1 intelflaters -XX:+UseSerialGC -- SEGFAULTS immediately after GATK version log.

    This shows clearly that either JNI implementation or GKL libraries could be the culprit for the memory allocation problem.

    I could not try this with GATK4 yet because GATK4 does not allow file lists as input and prefers GenomeDB currently but I have no interest in digging into that area until GATK4 is ready for primetime.

  • @shlee Simply setting jdk_deflater and -jdk_inflater seems to make it work for (GATK) vnightly-2017-11-05-g45c474f. But only 6 cores seem to be used even when 8 cores (n-nct 8) are provided.

  • buddejbuddej St. LouisMember

    @shlee said:
    @buddej, @SkyWarrior, @rmf et al.,

    Can you tell us more about the systems (hardware, os etc) you are running these on? It's my understanding the GKL is meant to accelerate analyses only on specific hardware.

    CentOS Linux release 7.3.1611 (Core)
    $ uname -a
    Linux HOSTNAME 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

    Two Xeon E5-2695-v3 14-core processors, HT-enabled

    768 GB DDR-4 memory (24x 32GB DIMMs, 1866MHz, Quad-Ranked)

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Thanks for the additional information and for keeping up with this thread. We may have additional questions for you.

    The Intel-HLS team is looking into this at < https://github.com/Intel-HLS/GKL/issues/81>. In the meanwhile, for those reading this thread with similar issues, please add -jdk_deflater and -jdk_inflater to the erring commands. Also, if you could post your system info like @SkyWarrior and @buddej above (thank you again), then it will be helpful towards getting this issue solved.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi everyone. The bug was identified, fixed on the Intel-HLS side today, and will be fixed for GATK going forward. We are currently on GATK v3.8 and v4.beta.6, so the next respective releases should fix the bug. In computer-speak, the bug was a memory corruption issue when GKL was writing to Java memory that then could result in a segmentation fault. Thanks again for bringing this to our attention.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Great news. Should we expect a 3.9 version or a 3.8.1 type bug fix version. or maybe a nightly ?

  • SheilaSheila Broad InstituteMember, Broadie admin

    @SkyWarrior
    Hi,

    Yes, there will be a new GATK3 release with the fix in it. Keep an eye out for the announcement :smiley:

    -Sheila

Sign In or Register to comment.