We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
GATK 3.8-0 PrintReads fatal error

Hello,
Could you please help me to figure out this fatal error in running PrintReads?
After I updated GATK to version 3.8-0. I kept getting this fatal error in running PrintReads. I can skip this step and run HaplotypeCaller with -BQSR option.
parsing sample: SRR098333
Best Answer
-
shlee Cambridge ✭✭✭✭✭
Hi @tommycarstensen, @djwhiteepcc, @alexsson et al. I have seen a similar error when running Picard SortSam on a linux cloud VM.
# A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f02883f757e, pid=32065, tid=0x00007f02856d9700
One issue for me was that there was no space left on my VM. The solution (if I recall correctly) was to mount another drive to my VM instance with 500GB storage.
So why do we now see in GATK v3.8 an error I saw with Picard? I have one hypothesis. Let me know if the following sounds plausible.
GATK v3.8 implements a feature that Picard implemented a while back, in July 2017, for v2.10.4. This feature is the Intel inflater/deflater, a feature that impacts I/O and memory. GATK 3.8 implements the Intel inflater/deflater by default now when previously we used the JDK inflator/deflator. I explain the impact of these in the context of Picard tools, at https://gatkforums.broadinstitute.org/gatk/discussion/comment/42849.
To go back to using the JDK inflator/deflator in GATK v3.8, you can specify
-jdk_deflater
and-jdk_inflater
flags. The v3.8 release notes (here) also say to specify-pairHMM LOGLESS_CACHING
to disable the Intel chip optimizations for HaplotypeCaller, in case this could be an additional source of the odd Java runtime error.As an aside, I believe GATK4 (but I'm not aware for GATK3) that the default BAM
COMPRESSION_LEVEL
will be lowered from 5 to 1 or 2, which will also impact memory usage. Please be sure these settings are as you expect them to be.If the problems persist or are solved after going back to the JDK inflator/deflator, please let us know right away. Our developers will be interested in recapitulating the error with test data so they can find a solution for the Intel inflator/deflator. We would be grateful if one or all of you could submit some data following instructions in https://software.broadinstitute.org/gatk/guide/article?id=1894. Thanks.
Answers
@fangpingmu
Hi,
Can you check if this happens without
-nct 8
? Also, what does hs_err_pid1932.log say?Thanks,
Sheila
This still happens without
-nct 8
.I can confirm that this problem appears when GATK-3.8.0 or above is used. If I change the above /home/apps/GATK/GenomeAnalysisTK-3.8.0/GenomeAnalysisTK.jar to GATK 3.7.0 GenomeAnalysisTK.jar, the command runs fine.
The raw reads are public data. You should be able to reproduce the errors. The samples are from SRA SRR098333 - SRR098338. I download them using this command:
fastq-dump --split-3 --qual-filter-1 SRR098333
I also try ReadsPipelineSpark within GATK 4.beta.5. 2 out of the 8 samples give similar error. SRR098338 is one. I also realize that GATK version above 3.8.0 requires significantly more memory for PrintReads or ApplyRecalibration step.
@fangpingmu
Hi,
What happens if you add
ulimit -c unlimited
as suggested in the error message to your command?-Sheila
This will write the core dump. In my original post, "ulimit -c unlimited" is added so that "Core dump written. Default location: core or core.1932".
@fangpingmu
Hi,
Okay. Can you submit a bug report? Instructions are here.
Thanks,
Sheila
GATK-3.8.0-PrintReads.zip
Issue · Github
by Sheila
Hmm, I just got a very similar fatal error running MuTect2 from the GATK4-5 version. Tried using different Java versions and no difference. Also I noticed that the error is not consistently showing at the same place, sometimes it can run for 30 min, other times 2 hours, before eventually crashing.
Inconsistencies indicate a general system instability I guess. What kind of system specs do you have?
@fangpingmu
Hi,
I will take a look soon.
Thanks,
Sheila
@fangpingmu
Hi,
I see you just provided the log output file. I need a snippet of a BAM file that I can reproduce the error with. Please see the instructions I linked to above for more information.
Thanks,
Sheila
P.S. @krdav If you can submit a bug report , that would be great too.
I reloaded the bug report, GATK-3.8.0-PrintReads-crash-1.zip. For this bam file and recal_data.table, the error is consistently showing at the same place.
I had a similar error I believe using Picard tool version 2.13.2 with java verion 8.0_144-b01 and when trying to run the MarkDuplicates tool. I get the following error:
I was able to move forward by reducing the java option from -Xmx32G to -Xmx12G and had no issue with generating the bam file after that. Not sure if that would work for you to.
@fangpingmu
Hi,
I am testing the files now. I may need you to re-submit, as I get an "EOF marker is missing" error for the BAM file. How did you make the BAM file you sent over?
Thanks,
Sheila
Within the zip file, you can find a file named
command
, which lists the detailed commands to generate the bam file. It takes several hours to go to this PrintReads step. Let me know whether you need me to re-submit the bam file.At picard step, I did use java options
-Xms5g -Xmx16g
.Hi,
I had the same error ("InstanceKlass::oop_follow_contents(ParCompactionManager*, oopDesc*)+0x16b") using GATK-3.8 GenotypeGVCFs on 4 combined batches of WGS splitted by chromosome. Each chromosome failed with this error. I tried several time and each time it failed with this error. This seemed to be random according to the log (but using -nt 8 so log files are likely untrustworthy).
I then tried to follow Ryan suggestion by lowering java memory option from -Xmx50g to -Xmx22g (still using -nt 8) and now it seems to work : chromosome 8 to 22 worked fine and the others are still running.
@bgrenier @Ryan
Hi,
Thanks for reporting your solution.
@fangpingmu Can you try what the others posted and see if that fixes your issue? If it does not, I will need you to re-submit the BAM file, as I am getting an EOF marker missing error for the BAM file.
Thanks,
Sheila
I also had the same issue with the same solution - reducing the total VM memory to 16gb with
-Xmx<>g
seems to have allowed things to run, even with multiple cores. I think I'm on Java 1.7, but I saw the same behavior on 1.8.@ericco92
Hi,
Great, thanks for letting us know. GATK4 only supports Java 1.8. Java 1.7 may run without errors, but things could be failing silently. It is best to use 1.8.
-Sheila
I performed trial-and-error. When I reduced the memory requirement to
-Xms1g -Xmx5g
and GATK 3.8.0 PrintReads runs OK for this example.@fangpingmu
Hi,
Great, thanks for confirming. I am checking with the team why this may solve the issue.
-Sheila
Since it also affects Picard I suspect the problem might be due to the GKL, which was patched quite recently. Can anyone confirm if this still happens with the very latest nightly build?
I have exactly the same problem!!!! We are using gatk on a production server, any way we can patch gatk 3.8 without downloading nightly builds?
I can confirm that the error is still there with the nightly build (nightly-2017-11-04-g45c474f).
I also looked at the log showing that pipereads is using -Xmx 91000m. It does not respect my Xmx settings in the global config file (Im using bcbio-nextgen) which is set to -Xmx 7000m. I'm not sure how to reduce this Xmx setting via the config file in bcbio-nextgen (1.0.5), why does it not respect the value in the config file (bcbio_system.yaml)?
@alexsson
Hi,
Have you tested this just running on your computer without bcbio-nextgen? If you confirm this error still happens on your computer, I may need you to submit a bug report.
-Sheila
I am unfortunately joining the choir. I get this error message with
3.8
,2017-10-06-g1994025
and2017-11-07-g45c474f
when running GenotypeGVCFs with--num_threads
greater than 1 (haven't tried with 1) withjre1.8.0_74
andjre1.8.0_60
:I'm afraid I'm also having the same issue with
GATK v3.8-0-ge9d806836
, compiled 2017/07/28 21:26:50. GATK is used in two steps, one using BaseRecalibrator (works fine) and one for PrintReads (fails). Both were originally run with-Xmx128G
and-nct 72
. I've managed to get PrintReads to work if I set the maximum memory to 16GB or less. There may be a higher maximum, but I've not had time to increment enough, although I know 32GB causes PrintReads to fail. I also change max threads to 36 instead of 72, and it still failed at 32GB, so number of threads is either not important, or doesn't have as big of an impact. I've also used both OpenJDK and Oracle JDK, both giving the same issue.It also fails with
-nt 1
and-nct 1
.With GenotypeGVCFs3.8 I lowered
-nt
and-Xmx
from 24 and 64GB to 8 and 16GB, respectively. That seemed to do the trick.Hi @tommycarstensen, @djwhiteepcc, @alexsson et al. I have seen a similar error when running Picard SortSam on a linux cloud VM.
One issue for me was that there was no space left on my VM. The solution (if I recall correctly) was to mount another drive to my VM instance with 500GB storage.
So why do we now see in GATK v3.8 an error I saw with Picard? I have one hypothesis. Let me know if the following sounds plausible.
GATK v3.8 implements a feature that Picard implemented a while back, in July 2017, for v2.10.4. This feature is the Intel inflater/deflater, a feature that impacts I/O and memory. GATK 3.8 implements the Intel inflater/deflater by default now when previously we used the JDK inflator/deflator. I explain the impact of these in the context of Picard tools, at https://gatkforums.broadinstitute.org/gatk/discussion/comment/42849.
To go back to using the JDK inflator/deflator in GATK v3.8, you can specify
-jdk_deflater
and-jdk_inflater
flags. The v3.8 release notes (here) also say to specify-pairHMM LOGLESS_CACHING
to disable the Intel chip optimizations for HaplotypeCaller, in case this could be an additional source of the odd Java runtime error.As an aside, I believe GATK4 (but I'm not aware for GATK3) that the default BAM
COMPRESSION_LEVEL
will be lowered from 5 to 1 or 2, which will also impact memory usage. Please be sure these settings are as you expect them to be.If the problems persist or are solved after going back to the JDK inflator/deflator, please let us know right away. Our developers will be interested in recapitulating the error with test data so they can find a solution for the Intel inflator/deflator. We would be grateful if one or all of you could submit some data following instructions in https://software.broadinstitute.org/gatk/guide/article?id=1894. Thanks.
Thanks @shlee ! I'll try that. The documentation reads as follows by the way:
"IntelDeflater (the new default in GATK version 3.8) and the JDK Deflater (the previous GATK default)"
"IntelInflater (the new default in GATK version 3.8) and the JDK Inflater (the previous GATK default)"
So which one was the previous default? Maybe there should just be one flag? Thanks again!!
Hopefully this also solves my problem of a process using more CPU than I specify with
-nt
and-nct
; a problem I did not have previously (i.e. 3.4). On our cluster it leads a job to be killed, if you use more cores than you requested.JDK was the previous default for both deflator and inflator. I suppose there are situations in which mixing JDK with Intel for deflations vs. inflation might be desirable.
Setting the jdk_inflater/deflater flags seems to fix my issues, so far. I will get in touch if anything changes. Cheers!
I concur. I haven't experienced any issues after I made this change. Thanks.
I am running GATK nightly version
nightly-2017-11-05-g45c474f
because the supposedly stable 3.8-0 was running into errors during base recalibration. Now I am running print reads like below:And I get this error:
I lowered cores from 8 to 2 and RAM from 64GB to 16GB (as suggested here). Changed nct from 8 to 2. Now it seems to running alright. The estimated completion time remained more or less the same (7.1 hours vs 8 hours) in both cases (8 cores vs 2 cores). That is a bit strange too.
Hi @rmf,
If you are optimizing run times for your setup that uses GATK3.8, then perhaps you would be interested in testing additionally with
-jdk_deflater
and-jdk_inflater
flags.I encountered the same error with
picard MarkDuplicates
as @Geraldine_VdAuwera mentioned above, so maybe this does have to do with GKL. I tested with 2.14.0, 2.14.1, and a recently nightly picard-2.14.0-7-g28a441a, all using Sun's jre1.8.0.151MarkDuplicates
succeeds with-Xmx24g
,-Xmx30g
or-Xmx31g
MarkDuplicates
fails with-Xmx32g
(or-Xmx33g
even)Strangely, the .bam that is output is 99.99+% complete (849903407 / 849905490 reads, in my large test case) at the point when picard fails.
The option
USE_JDK_INFLATER=TRUE
does not affect anything -- picard still fails with the same errorAdding the option
USE_JDK_DEFLATER=TRUE
results in a successful run with-Xmx32g
(or larger)The output .bam files are identical when viewed, and only about 1% larger on average. Runtime was 10% longer with the JDK deflater, though.
I am observing the same error with other GATK tools as well (version 3.8). -Xmx more than 31G always fails no matter what you do with GATK. Could be an issue about GKL.
Related to my previous comment. I lowered RAM from 64GB to 16GB and got it to run. But then at some point it crashed due to out of memory error.
This is difficult to work with. It's either too much RAM or too little RAM. Is there a way around?
Can I use the printReads function from another version of GATK? This sounds like a bad idea. But, I am trying to avoid having to run the whole pipeline using an older version of GATK. In hindsight, I should've done that.
@buddej, @SkyWarrior, @rmf et al.,
Can you tell us more about the systems (hardware, os etc) you are running these on? It's my understanding the GKL is meant to accelerate analyses only on specific hardware.
Also @rmf, can you post your exact PrintReads command? It's not clear which inflator/deflator you are using.
Our production uses different versions of GATK per analysis task. See this reference implementation for an example. If you know a particular version of a tool works as expected for your aims, i.e. it is validated by you, then there is little reason to upgrade versions for a tool that has not undergone other improvements in the versions besides the convenience of using a single version. It's my understanding there are few tool-specific changes between v3.8 and v3.7. Release notes are at https://software.broadinstitute.org/gatk/blog?id=10063.
I will see if the GKL folks have any additional insight.
P.S. GKL question is posted at https://github.com/Intel-HLS/GKL/issues/81.
I am running GATK3.8 on an 128gb ram Intel Skylake-X system with ubuntu 4.13 kernel and Oracle Java 1.8.0_151. There is nothing fancy on the configuration or the OS so everything is pretty much vanilla.
regardless of the java version I experienced this segfaulting whenever my Xmx is 32G or higher.
For the sake of comparison I can try an earlier version like 3.7 and also 4.beta6 to see if segfaulting still persists. I can give a definitive feedback on monday.
I am running this version of GATK (
(GATK) vnightly-2017-11-05-g45c474f
). I am using a computing cluster running Scientific Linux Red Hat 4.4.7-18. This is what I could find. The cores are dual CPU (Intel Xeon E5-2660) with 8 GB RAM per core. JAVA version is sun_jdk1.8.0_92. I think 32GB RAM is the threshold beyond which the errors start. But, depending on the BAM file, I get out of memory errors at lower RAM 8GB/16GB/24GB. I don't know anything about this inflator/deflator. My code looks like:This works:
This does not work:
Produces the error below:
Hi @rmf,
Here are some options to consider. Some of these come from other users who have stated some of these change erring runs to runs that finish.
[1] Limiting garbage collection's memory use with
-XX:+UseSerialGC
. It's my understanding that without this parameter, garbage collection will take as much memory as you have available.[2] Using the
gatk-launch
script to invoke the jar. This sets a number of options on your behalf for optimal runs.[3] Switching v3.8's new default GKL inflator/deflator to the previous default's JDK inflator/deflator by adding
-jdk_deflater
and-jdk_inflater
to the GATK tool command. There are similar options for Picard. One user stated only switching back to the JDK deflator was necessary for their run to succeed and the inflator did not matter.My current findings:
All tried with GenotypeGVCFs tool. Java 1.8.0_151 Ubuntu 17.10 Kernel 4.13 and 4.14 (Vanilla builds no gimmicks, GATK 3.8)
all gvcfs were generated inaccordance to GATK best practices with HaplotypeCaller -ERC GVCF and GATK3.8 or GATK3.7
1- -Xmx24G 133 samples with -nt1 intelflaters -- Runs fine and completes in 7 hours
2- -Xmx32G 133 samples with -nt 1 intelflaters -- SEGFAULTS in 5 to 10 minutes
3- -Xmx48G 133 samples with -nt 4 intelflaters -- SEGFAULTS in 5 to 10 minutes
4- -Xmx32G 133 samples with -nt 8 jdkflaters -- Runs fine and completes in 1 hour
5- -Xmx96G 133 samples with -nt 16 jdkflaters -- Runs fine and completes in 54 minutes
6- -Xmx48G 133 samples with -nt 8 intelflaters -XX:+UseSerialGC -- SEGFAULTS in 5 to 10 minutes
7- Xmx32G 133 samples with -nt 1 intelflaters -XX:+UseSerialGC -- SEGFAULTS immediately after GATK version log.
This shows clearly that either JNI implementation or GKL libraries could be the culprit for the memory allocation problem.
I could not try this with GATK4 yet because GATK4 does not allow file lists as input and prefers GenomeDB currently but I have no interest in digging into that area until GATK4 is ready for primetime.
@shlee Simply setting
jdk_deflater
and-jdk_inflater
seems to make it work for(GATK) vnightly-2017-11-05-g45c474f
. But only 6 cores seem to be used even when 8 cores (n-nct 8
) are provided.CentOS Linux release 7.3.1611 (Core)
$ uname -a
Linux HOSTNAME 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Two Xeon E5-2695-v3 14-core processors, HT-enabled
768 GB DDR-4 memory (24x 32GB DIMMs, 1866MHz, Quad-Ranked)
Thanks for the additional information and for keeping up with this thread. We may have additional questions for you.
The Intel-HLS team is looking into this at < https://github.com/Intel-HLS/GKL/issues/81>. In the meanwhile, for those reading this thread with similar issues, please add
-jdk_deflater
and-jdk_inflater
to the erring commands. Also, if you could post your system info like @SkyWarrior and @buddej above (thank you again), then it will be helpful towards getting this issue solved.Hi everyone. The bug was identified, fixed on the Intel-HLS side today, and will be fixed for GATK going forward. We are currently on GATK v3.8 and v4.beta.6, so the next respective releases should fix the bug. In computer-speak, the bug was a memory corruption issue when GKL was writing to Java memory that then could result in a segmentation fault. Thanks again for bringing this to our attention.
Great news. Should we expect a 3.9 version or a 3.8.1 type bug fix version. or maybe a nightly ?
@SkyWarrior
Hi,
Yes, there will be a new GATK3 release with the fix in it. Keep an eye out for the announcement
-Sheila