Ever wish you could automatically remove your unwanted output files from a submission without having to manually review them? If so, take this two minute survey and tell us more.
Latest Release: 1/17/19
Release Notes can be found here.

No Space Left on Device

Hello!

I'm currently running WGS data on 31 samples through the default "processing-for-variant-discovery-gatk4" workflow. When ran my uBAMs through, I noticed that roughly half of them failed at the MergeBamAlignment step, with the same stderr message, an example of which is pasted below:

INFO    2018-06-28 05:37:31 AbstractAlignmentMerger 47949359 Reads have been unmapped due to being suspected of being Cross-species contamination.
INFO    2018-06-28 05:38:05 AbstractAlignmentMerger Merged   509,000,000 records.  Elapsed time: 04:33:28s.  Time for last 1,000,000:   33s.  Last read position: chr4:3,404,335
INFO    2018-06-28 05:38:05 AbstractAlignmentMerger 48045092 Reads have been unmapped due to being suspected of being Cross-species contamination.
INFO    2018-06-28 05:38:37 AbstractAlignmentMerger Merged   510,000,000 records.  Elapsed time: 04:34:01s.  Time for last 1,000,000:   32s.  Last read position: */*
INFO    2018-06-28 05:38:37 AbstractAlignmentMerger 48140059 Reads have been unmapped due to being suspected of being Cross-species contamination.
INFO    2018-06-28 05:39:09 AbstractAlignmentMerger Merged   511,000,000 records.  Elapsed time: 04:34:33s.  Time for last 1,000,000:   32s.  Last read position: */*
INFO    2018-06-28 05:39:09 AbstractAlignmentMerger 48235899 Reads have been unmapped due to being suspected of being Cross-species contamination.
INFO    2018-06-28 05:39:41 AbstractAlignmentMerger Merged   512,000,000 records.  Elapsed time: 04:35:05s.  Time for last 1,000,000:   32s.  Last read position: chr4:60,986,275
INFO    2018-06-28 05:39:41 AbstractAlignmentMerger 48330565 Reads have been unmapped due to being suspected of being Cross-species contamination.
INFO    2018-06-28 05:40:13 AbstractAlignmentMerger Merged   513,000,000 records.  Elapsed time: 04:35:37s.  Time for last 1,000,000:   32s.  Last read position: chr15:27,220,660
INFO    2018-06-28 05:40:13 AbstractAlignmentMerger 48425295 Reads have been unmapped due to being suspected of being Cross-species contamination.
INFO    2018-06-28 05:40:46 AbstractAlignmentMerger Merged   514,000,000 records.  Elapsed time: 04:36:10s.  Time for last 1,000,000:   32s.  Last read position: chr3:159,444,239
INFO    2018-06-28 05:40:46 AbstractAlignmentMerger 48519800 Reads have been unmapped due to being suspected of being Cross-species contamination.
INFO    2018-06-28 05:41:18 AbstractAlignmentMerger Merged   515,000,000 records.  Elapsed time: 04:36:42s.  Time for last 1,000,000:   32s.  Last read position: chr10:12,075,925
INFO    2018-06-28 05:41:18 AbstractAlignmentMerger 48615263 Reads have been unmapped due to being suspected of being Cross-species contamination.
INFO    2018-06-28 05:41:51 AbstractAlignmentMerger Merged   516,000,000 records.  Elapsed time: 04:37:15s.  Time for last 1,000,000:   32s.  Last read position: */*
INFO    2018-06-28 05:41:51 AbstractAlignmentMerger 48709353 Reads have been unmapped due to being suspected of being Cross-species contamination.
[Thu Jun 28 05:42:13 UTC 2018] picard.sam.MergeBamAlignment done. Elapsed time: 277.63 minutes.
Runtime.totalMemory()=3040870400
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
htsjdk.samtools.util.RuntimeIOException: Write error; BinaryCodec in writemode; streamed file (filename not available)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:222)
    at htsjdk.samtools.util.BlockCompressedOutputStream.writeGzipBlock(BlockCompressedOutputStream.java:389)
    at htsjdk.samtools.util.BlockCompressedOutputStream.deflateBlock(BlockCompressedOutputStream.java:352)
    at htsjdk.samtools.util.BlockCompressedOutputStream.write(BlockCompressedOutputStream.java:251)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
    at htsjdk.samtools.util.BinaryCodec.writeByteBuffer(BinaryCodec.java:188)tee: /cromwell_root/MergeBamAlignment-0-stderr.log: No space left on device

    at htsjdk.samtools.util.BinaryCodec.writeInt(BinaryCodec.java:234)
    at htsjdk.samtools.BAMRecordCodec.encode(BAMRecordCodec.java:134)
    at htsjdk.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:143)
    at htsjdk.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:198)
    at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:36)
    at htsjdk.samtools.AsyncSAMFileWriter.synchronouslyWrite(AsyncSAMFileWriter.java:16)
    at htsjdk.samtools.util.AbstractAsyncWriter$WriterRunnable.run(AbstractAsyncWriter.java:123)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: No space left on device
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
    at sun.nio.ch.IOUtil.write(IOUtil.java:65)
    at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78)
    at java.nio.channels.Channels.writeFully(Channels.java:101)
    at java.nio.channels.Channels.access$000(Channels.java:61)
    at java.nio.channels.Channels$1.write(Channels.java:174)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at htsjdk.samtools.util.BinaryCodec.writeBytes(BinaryCodec.java:220)
    ... 13 more
Using GATK jar /gatk/build/libs/gatk-package-4.0.4.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Dsamjdk.compression_level=5 -Xms3000m -jar /gatk/build/libs/gatk-package-4.0.4.0-local.jar MergeBamAlignment --VALIDATION_STRINGENCY SILENT --EXPECTED_ORIENTATIONS FR --ATTRIBUTES_TO_RETAIN X0 --ALIGNED_BAM /cromwell_root/fc-4449151c-8501-4474-a203-83d0c4dbd051/b66d34b9-d482-4325-9667-17466fddf3a4/PreProcessingForVariantDiscovery_GATK4/f1da863c-a52c-4bfe-98e3-d3a12887928a/call-SamToFastqAndBwaMem/shard-0/attempt-4/857_2976_2_2_H7VLWALXX_1.unmapped.unmerged.bam --UNMAPPED_BAM /cromwell_root/fc-4449151c-8501-4474-a203-83d0c4dbd051/uBAMs/857_2976_2_2/857_2976_2_2_H7VLWALXX_1.unmapped.bam --OUTPUT 857_2976_2_2_H7VLWALXX_1.unmapped.aligned.unsorted.bam --REFERENCE_SEQUENCE /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta --PAIRED_RUN true --SORT_ORDER unsorted --IS_BISULFITE_SEQUENCE false --ALIGNED_READS_ONLY false --CLIP_ADAPTERS false --MAX_RECORDS_IN_RAM 2000000 --ADD_MATE_CIGAR true --MAX_INSERTIONS_OR_DELETIONS -1 --PRIMARY_ALIGNMENT_STRATEGY MostDistant --PROGRAM_RECORD_ID bwamem --PROGRAM_GROUP_VERSION 0.7.15-r1140 --PROGRAM_GROUP_COMMAND_LINE bwa mem -K 100000000 -p -v 3 -t 16 -Y /cromwell_root/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta --PROGRAM_GROUP_NAME bwamem --UNMAPPED_READ_STRATEGY COPY_TO_TAG --ALIGNER_PROPER_PAIR_FLAGS true --UNMAP_CONTAMINANT_READS true

My question is two-fold:
1. How can I resolve the "Caused by: java.io.IOException: No space left on device" error?
2. The example above has 48,709,353 reads marked for cross-contamination out of 516,000,000 records. Is this a particularly high number?

Thank you for the help!
Lee

Best Answer

Answers

  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Broadie, Moderator admin
    edited June 2018

    Hi Lee @lrao
    1. In your WDL, you can specify a bootdisksizeGB parameter in the runtime section of each of your tasks and that should fix this. See the more about how to do this here.
    2. I will ask @Sheila if she can answer your GATK question.
    Hope this helps!

    Post edited by Tiffany_at_Broad on
  • lraolrao Member
    edited June 2018

    Hi Tiffany,

    I don't think that's the issue, because it looks like I'm able to pull the Docker image just fine, based on my error logs. I'll try it anyways though, and I'll keep you updated on the results.

    Update: I'm having some difficulty editing the original wdl file on FireCloud. I searched the forums and found a rather involved solution (https://gatkforums.broadinstitute.org/firecloud/discussion/7863/how-to-use-my-own-method-in-firecloud-workspace) for the very small change I'd like to do, is there an easier way to go about this?

    Thanks for the quick response!
    Lee

    Post edited by lrao on
  • Tiffany_at_BroadTiffany_at_Broad Cambridge, MAMember, Broadie, Moderator admin

    Hi @lrao
    Is this the WDL you are using? If so, you can clone the method in the Method Repo, then update the WDL code.
    You could try increasing the disk size for the failing task in the method configuration if that parameter is available or directly in the WDL itself if it is hardcoded. Let me know what you end up tweaking and if it works!

  • lraolrao Member
    edited June 2018

    Thank you all for the help! I'm currently running one sample with @bshifaw's suggestion, so I'll keep you all updated on how it goes.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @lrao
    Hi Lee,

    Back to your contamination question. That does seem like a lot of "contaminant" reads (9%). Any read (plus its mate) that is softclipped on both ends and that aligns with less than 32 bases is considered to be contamination. What kind of data are you working with (eg how was it sequenced)? Have you done any QC on your data?

    You can also try reducing the number of bases that need to be aligned as described in this article.

    -Sheila

  • lraolrao Member

    It looks like the disk size fix worked! I'll mark @bshifaw as the accepted answer.

    @Sheila, I'm working with WGS from neuronal stem cell lines derived from fibroblasts of Parkinson's Disease patients. It was Illumina pair-end sequencing, with 150bp reads. I've attached the fastQC output of a file with relatively high contaminants (~11%); for the most part, everything seems fairly reasonable, although kmer content was flagged as poor.

    What would be a "normal" number of reads unmapped due to contamination? I'm having difficulty finding any benchmark values.

Sign In or Register to comment.