We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

picard 2.6.0 ValidateSamFile produces an exception for BGZF file with invalid uncompressedLength

grggrg AustraliaMember

Hi,

I ran out of disk space while running our mapping pipeline. When running picard's ValidateSamFile tool on one of the bam files, it produced the Exception below. I'm not sure this is a bug per se, but I did expect ValidateSamFile to handle this more gracefully.

-G

[Thu Oct 20 08:53:01 CST 2016] picard.sam.ValidateSamFile INPUT=Bos_javanicus_lowi_HiSeq2000_DRR016397/collapsed.bam IGNORE=[MATE_NOT_FOUND, INVALID_QUALITY_FORMAT] MAX_OPEN_TEMP_FILES=7782 MODE=VERBOSE MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Thu Oct 20 08:53:01 CST 2016] Executing as [email protected] on Linux 2.6.32-504.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Picard version: 2.6.0-SNAPSHOT
[Thu Oct 20 08:53:10 CST 2016] picard.sam.ValidateSamFile done. Elapsed time: 0.15 minutes.
Runtime.totalMemory()=2075918336
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: BGZF file has invalid uncompressedLength: -477174215
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:407)
at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:392)
at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:127)
at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:260)
at java.io.DataInputStream.read(DataInputStream.java:149)
at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:404)
at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:490)
at htsjdk.samtools.util.BinaryCodec.readInt(BinaryCodec.java:501)
at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:177)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:682)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:656)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:650)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:620)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:545)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:519)
at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:269)
at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:200)
at htsjdk.samtools.SamFileValidator.validateSamFileVerbose(SamFileValidator.java:160)
at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:199)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Caused by: java.lang.NegativeArraySizeException
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:405)
... 22 more

Issue · Github
by Sheila

Issue Number
1386
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @grg
    Hi G,

    Can you try removing IGNORE=[MATE_NOT_FOUND, INVALID_QUALITY_FORMAT] from your command and run in SUMMARY MODE?

    Thanks,
    Sheila

  • grggrg AustraliaMember

    The following command produces the same exception.

    $ java -jar /localscratch/Programs/picard-2.6.0/picard.jar ValidateSamFile I=collapsed.bam MODE=SUMMARY

    [Sun Oct 23 10:18:51 CST 2016] picard.sam.ValidateSamFile INPUT=collapsed.bam MODE=SUMMARY MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
    [Sun Oct 23 10:18:51 CST 2016] Executing as [email protected] on Linux 2.6.32-504.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Picard version: 2.6.0-SNAPSHOT
    [Sun Oct 23 10:18:57 CST 2016] picard.sam.ValidateSamFile done. Elapsed time: 0.09 minutes.
    Runtime.totalMemory()=2595749888
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: BGZF file has invalid uncompressedLength: -477174215
    at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:407)
    at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:392)
    at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:127)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:260)
    at java.io.DataInputStream.read(DataInputStream.java:149)
    at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:404)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
    at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:490)
    at htsjdk.samtools.util.BinaryCodec.readInt(BinaryCodec.java:501)
    at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:177)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:682)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:656)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:650)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:620)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:545)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:519)
    at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:269)
    at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:200)
    at htsjdk.samtools.SamFileValidator.validateSamFileSummary(SamFileValidator.java:128)
    at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:196)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:208)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
    Caused by: java.lang.NegativeArraySizeException
    at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:405)
    ... 22 more

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @grg
    Hi G,

    Okay. Can you tell us what steps/tools you ran on the BAM file before you got this error message? Can you pinpoint a step that caused the error message?

    Thanks,
    Sheila

  • grggrg AustraliaMember

    The bam file was produced with 'bwa mem ... | samtools view -b'. But the file is corrupt/incomplete because the disk became full during the run. When I run samtools view on it I get the following:

    $ samtools view -H collapsed.bam > /dev/null
    [bam_header_read] EOF marker is absent. The input is probably truncated.

    The question for your developers is: should this exception (and others of its ilk) be caught and handled gracefully, or is this deemed not to be within the purvue of the ValidateSamFile tool?

    Issue · Github
    by Sheila

    Issue Number
    1381
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭
    edited October 2016

    @grg
    Hi G,

    Got it. Yes, it may be better to handle this more gracefully. I am going to make a note to the team. I suspect this fix will go into GATK4.

    Thanks,
    Sheila

    EDIT: Also, did you get the error message from BWA that something went wrong?

  • grggrg AustraliaMember

    @Sheila said:
    EDIT: Also, did you get the error message from BWA that something went wrong?

    I'm not sure. This was within a mapping pipeline, which halted when the disk became full. When I restarted the pipeline, the bam validation step failed on this file.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @grg
    Hi G,

    Okay. Thanks. I am going to put in a bug report to Picard so there is a check for this.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @grg We're having some trouble replicating your error, despite having created truncated bams. Would you be able to share your problem bam with us?

  • grggrg AustraliaMember

    Sorry @Geraldine_VdAuwera, I have now deleted the truncated bams. They were large and I needed disk space.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Ah, fair enough. Unfortunately that means we're not going to be able to debug in depth -- but we'll keep an eye out in case this happens again.
Sign In or Register to comment.