Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

Truncated Bam error in recalibrated bam file

vivekruhelavivekruhela Member
edited March 2018 in Ask the GATK team

I am trying to complete my pipeline to get variant calls and annotation using various tools of gatk. I am getting stuck in last step of post-processing. Here is my post-processing steps:

  1. Sam-to-Bam Conversion
  2. Bam Validation
  3. Sort Bam file (by SORT_ORDER = "queryname")
  4. BAM fixmate
  5. Again sort Bam (by SORT_ORDER = "coordinate")
  6. Remove duplicates
  7. Indel realignment
  8. BQSR

Q1 : I am using the following command for sam-to-bam conversion:
samtools view -S [email protected] 30 -M -f 0x02 -b input_sam -o input_bam

where -f stands for considering only properly paired reads. I have checked that how many reads I have missed (means 0x04,0x08 etc). Very small amount of reads I have missed i.e. the size of original sam file is 26 gb and there is another sam with which has all the reads excluding 0x02 is 152 mb in size and . So it is ok to not to take all the reads other than properly paired because without -f, there are many error that stop the whole pipeline.

Q2 : Now up to indel realignment, everything goes fine. I am checking bam file in each stage with the help of samtools command :samtools view -c outfile.bam and samtools quickcheck -qv outfile.bam

Both shows the positive reports till indel realignment. But during BQSR, I am always getting the following error:
1. Following warning during covariates estimation:

WARN 04:28:08,395 IndexDictionaryUtils - Track knownSites doesn't have a sequence dictionary built in,skipping dictionary validation

  1. Following error during applying BQSR:

# # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007f2453644c7e, pid=41985, tid=0x00007f1c36250700 # # JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build 1.8.0_151 - 8u151- b12 -0ubuntu0.16.04.2-b12) # Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 ) # Problematic frame: # V [libjvm.so+0x61fc7e] # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again #

When I checked the error report, then it shows the following message:

`#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f0c1e489c7e, pid=15994, tid=0x00007f040204e700
#
# JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build 1.8.0_151-8u151- b12-0ubuntu0.16.04.2-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 )
# Problematic frame:
# V [libjvm.so+0x61fc7e]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#

--------------- T H R E A D ---------------

Current thread (0x00007f0c18029800): GCTaskThread [stack: 0x00007f0401f4e000,0x00007f040204f000] [id=16002]

siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000`

I can't show the complete error message because it is too long. But during starting, it never showed any error. When I experiment with -nt, -nct, then this error starts coming even when I have removed all -nt and-nct. Can you suggest me how to remove this error.

Thanks.

Post edited by vivekruhela on

Best Answer

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    What is your command line for GATK and what version are you using. Vanilla 3.8 is prone to this error due to Intel GKL compression bug that results in segfaulting if you assign more than 30 gb of heap size. 3.8-1 and GATK 4.0.x.x versions are free from this bug. GATK3.7 on the other hand does not have this bug because it is not using intel GKL compressor.

  • vivekruhelavivekruhela Member

    I am using "The Genome Analysis Toolkit (GATK) v3.8-0-ge9d806836"

    And the command line are as follows :
    For table file:

    java -Xms32g -Djar.io.tmpdir=/tmp -jar GenomeAnalysisTK.jar -T
    BaseRecalibrator -R reference.fastq -I indel_realignment_outfile.bam -knownsites
    All_20170710.vcf.gz -o outfile.table

    For recalibration of bam:

    java -Xms32g -jar GenomeAnalysisTK.jar -T
    PrintReads -R reference.fastq -I indel_realignment_outfile.bam -BQSR
    outfile.table -o outflie_recalibrated.bam

  • vivekruhelavivekruhela Member

    @SkyWarrior : Thanks for your suggestions. I removed Xms32g and then it is working fine. Why so? I don't know. But that makes gatk slow. I did it to make it faster because gatk doesn't support parallel computation. Is there any way to make it faster (on server).

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    GATK 3.8 has a bug for memory allocation due to old Intel GKL. Intel GKL is updated in the latest version 3.8-1 and GATK 4.0 releases so they don't have this bug. If you still want to stick with 3.8 and use high amounts of heap space then you need to use switches --jdk-inflater --jdk-deflater to disable Intel GKL for compression and decompression. You won't get errors that way too. --jdk-inflater and --jdk-deflater is slower btw but you can safely use -nt or -nct (upto some extend. forcing beyond 4 or 8 barely gives you any advantage.)

  • vivekruhelavivekruhela Member

    What about my Q1. ??

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    It is up to you. My preference; I never eliminate any reads from my files.

Sign In or Register to comment.