Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

FireCloud Sort and BQSR failing - java.io.IOException: Broken pipe

kmegqkmegq BroadMember, Broadie

Dear FireCloud team,

We have been running a scaled-down version of the GATK best practices preprocessing wdl, which we have edited such that it is just running SortAndFixTags and then running BQSR on whole genome sequencing data from dogs (alignment and MarkDuplicates having already been run by the Genomics Platform). Our pipeline seemed to work well on 33 of 34 samples we have run so far, but one sample failed, with the following errors:

Elapsed time: 12:07:19s.  Time for last 10,000,000:  222s.  Last read position: chr6:45,839,960
[Sun Apr 07 07:15:32 UTC 2019] picard.sam.SetNmAndUqTags done. Elapsed time: 727.50 minutes.
Runtime.totalMemory()=1094713344
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.SAMException: Exception when processing alignment for BAM index HFTLGALXX170510:4:1116:22262:20436 1/2 151b aligned to chr6:46451817-46451967.
    at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:140)
    at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:185)
    at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:36)
    at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:16)
    at htsjdk.samtools.util.AbstractAsyncWriter$WriterRunnable.run(AbstractAsyncWriter.java:123)
    at java.lang.Thread.run(Thread.java:748)
Caused by: htsjdk.samtools.util.RuntimeIOException: Write error; BinaryCodec in writemode; streamed file (filename not available)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:222)
    at htsjdk.samtools.util.BlockCompressedOutputStream.writeGzipBlock(BlockCompressedOutputStream.java:429)
    at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:392)
    at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:291)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
    at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:188)
    at htsjdk.samtools.util.BinaryCodec.writeInt(BinaryCodec.java:234)
    at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:150)
    at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:133)
    ... 5 more
Caused by: java.io.IOException: No space left on device
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
    at java.nio.channels.Channels.writeFully(Channels.java:101)
    at java.nio.channels.Channels.access$000(Channels.java:61)
    at java.nio.channels.Channels$1.write(Channels.java:174)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at htsjdk.samtools.util.Md5CalculatingOutputStream.write(Md5CalculatingOutputStream.java:89)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
    ... 13 more
Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.compression_level=5 -Xms500m -jar /gatk/gatk-package-4.1.0.0-local.jar SetNmAndUqTags --INPUT /dev/stdin --OUTPUT MR415976_N.sorted.bam --CREATE_INDEX true --CREATE_MD5_FILE true --REFERENCE_SEQUENCE /cromwell_root/fc-0b0cb3ce-e2cb-4aef-a8b2-08e60d78e87c/Canis_lupus_familiaris_assembly3.fasta
[Sun Apr 07 07:15:32 UTC 2019] picard.sam.SortSam done. Elapsed time: 727.51 minutes.
Runtime.totalMemory()=4192731136
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.util.RuntimeIOException: Write error; BinaryCodec in writemode; streamed file (filename not available)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:222)
    at htsjdk.samtools.util.BlockCompressedOutputStream.writeGzipBlock(BlockCompressedOutputStream.java:429)
    at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:392)
    at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:291)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:212)
    at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:164)
    at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:143)
    at htsjdk.samtools.SAMFileWriterImpl.close(SAMFileWriterImpl.java:210)
    at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyClose(AsyncSAMFileWriter.java:38)
    at htsjdk.samtools.util.AbstractAsyncWriter.close(AbstractAsyncWriter.java:89)
    at picard.sam.SortSam.doWork(SortSam.java:167)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
    at org.broadinstitute.hellbender.Main.main(Main.java:291)
Caused by: java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
    at java.nio.channels.Channels.writeFully(Channels.java:101)
    at java.nio.channels.Channels.access$000(Channels.java:61)
    at java.nio.channels.Channels$1.write(Channels.java:174)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
    ... 16 more
Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.compression_level=5 -Xms4000m -jar /gatk/gatk-package-4.1.0.0-local.jar SortSam --INPUT /cromwell_root/fc-8268e82b-ed61-4e04-a8c9-a95a05c0952e/MR415976_2.bam --OUTPUT /dev/stdout --SORT_ORDER coordinate --CREATE_INDEX false --CREATE_MD5_FILE false

Any thoughts as to what may be going on?

I have checked that the bam and index file uploaded to FireCloud correctly (md5sums match) and have run Picard Tools ValidateSamFile on the bam without getting any errors.

Thank you for your help!

Best regards,
Kate

Answers

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    Part of the error message mentions Caused by: java.io.IOException: No space left on device
    Hopefully increasing the disk space for the tasks runtime parameters will resolve the error.

  • kmegqkmegq BroadMember, Broadie

    Thank you, Beri! I tried increasing the memory on the runtime options, and it looks like it failed again:

    Elapsed time: 06:33:05s.  Time for last 10,000,000:  206s.  Last read position: chr19:50,361,977
    Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.compression_level=5 -Xms7000m -jar /gatk/gatk-package-4.1.0.0-local.jar SortSam --INPUT /cromwell_root/fc-8268e82b-ed61-4e04-a8c9-a95a05c0952e/MR415976_2.bam --OUTPUT /dev/stdout --SORT_ORDER coordinate --CREATE_INDEX false --CREATE_MD5_FILE false
    [Thu Apr 11 02:16:43 UTC 2019] picard.sam.SetNmAndUqTags done. Elapsed time: 393.92 minutes.
    Runtime.totalMemory()=2092957696
    To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
    htsjdk.samtools.FileTruncatedException: Premature end of file: data stream
        at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:530)
        at htsjdk.samtools.util.BlockCompressedInputStream.nextBlock(BlockCompressedInputStream.java:468)
        at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:458)
        at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:196)
        at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:331)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:421)
        at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394)
        at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:507)
        at htsjdk.samtools.util.BinaryCodec.readInt(BinaryCodec.java:518)
        at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:264)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:838)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:812)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:806)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:774)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:569)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
        at java.util.Iterator.forEachRemaining(Iterator.java:116)
        at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
        at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
        at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
        at picard.sam.SetNmMdAndUqTags.doWork(SetNmMdAndUqTags.java:141)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
        at org.broadinstitute.hellbender.cmdline.PicardCommandLineProgramExecutor.instanceMain(PicardCommandLineProgramExecutor.java:25)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:162)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:205)
        at org.broadinstitute.hellbender.Main.main(Main.java:291)
    Using GATK jar /gatk/gatk-package-4.1.0.0-local.jar
    Running:
        java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.compression_level=5 -Xms2000m -jar /gatk/gatk-package-4.1.0.0-local.jar SetNmAndUqTags --INPUT /dev/stdin --OUTPUT MR415976_N.sorted.bam --CREATE_INDEX true --CREATE_MD5_FILE true --REFERENCE_SEQUENCE /cromwell_root/fc-0b0cb3ce-e2cb-4aef-a8b2-08e60d78e87c/Canis_lupus_familiaris_assembly3.fasta
    

    Do you have any suggestions on which parameter might be the problem, or how large a memory limit to try? The bam is 115G, and it is trying to run BQSR. I'm a little surprised, as there is a 103G bam that ran fine using the default settings from your GATK preprocessing wdl.

    Thanks again!

    Best,
    Kate

  • bshifawbshifaw moonMember, Broadie, Moderator admin

    On the plus side it looks like it went through a little bit more of the sample. The error message doesn't mention memory or diskspace so that shouldn't be the problem.

    The error message is
    htsjdk.samtools.FileTruncatedException: Premature end of file: data stream.
    There's this article that suggests the file itself might be corrupt. Can you confirm that the sample file isn't corrupt and it ran through the tasks prior to this task fine? I was thinking perhaps the prior task didn't complete properly or something went wrong when copying/compressing the file.

Sign In or Register to comment.