GATK4 SplitNCigarReads RuntimeIOException: Attempt to add record to closed writer.

gerzsgerzs Member
edited February 6 in Ask the GATK team

On a Linux cluster, I ran this command on a node (no job scheduler):

./gatk SplitNCigarReads -R /bigdisk/databases/genomes/human/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa -I 28_tumor.dedupped.bam -O 28_tumor.split.bam

I get this error during SplitN's second pass:

13:27:28.945 INFO  ProgressMeter -           4:74283282            163.1             288256000        1767147.3`
13:27:38.955 INFO  ProgressMeter -           4:74283830            163.3             288523000        1766976.9`
13:27:46.176 INFO  SplitNCigarReads - Shutting down engine`
[February 6, 2018 1:27:46 PM CET] org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads done. Elapsed time: 163.42 minutes.`
Runtime.totalMemory()=12006719488`
htsjdk.samtools.util.RuntimeIOException: Attempt to add record to closed writer.
    at htsjdk.samtools.util.AbstractAsyncWriter.write(AbstractAsyncWriter.java:57)
    at htsjdk.samtools.AsyncSAMFileWriter.addAlignment(AsyncSAMFileWriter.java:53)
    at org.broadinstitute.hellbender.utils.read.SAMFileGATKReadWriter.addRead(SAMFileGATKReadWriter.java:21)
    at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.writeReads(OverhangFixingManager.java:349)
    at org.broadinstitute.hellbender.tools.walkers.rnaseq.OverhangFixingManager.flush(OverhangFixingManager.java:329)
    at org.broadinstitute.hellbender.tools.walkers.rnaseq.SplitNCigarReads.closeTool(SplitNCigarReads.java:195)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:897)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:136)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:152)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:195)
    at org.broadinstitute.hellbender.Main.main(Main.java:275)

The output file 28_tumor.split.bam is 0 bytes, and ther is an index file, 0 bytes also.

Java version: 1.8.0_162
GATK version: 4.0.0.0
OS: CentOS release 6.8

I ran this command on a different computer with Ubuntu 16.04 and had no problems. On different BAM files I get the same error. Any ideas? It's frustrating that I can't get GATK to run efficentlyon the cluster, only on slow computers or with computers with limited disk space. It took a month to run on about 45 pairs of RNA-Seq samples (of course I made errors during the time), so I really need it to run on the cluster.

Thanks,
Zsuzsa

Issue · Github
by Sheila

Issue Number
2903
State
closed
Last Updated
Assignee
Array
Closed By
chandrans

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @gerzs
    Hi Zsuzsa,

    I need to check with the team on what you can do. I will get back to you soon.

    -Sheila

  • Hi Sheila,
    thanks! Just a note, I now use GATK 3.8 on the cluster, but of course it would be nice to use the latest version.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @gerzs
    Hi Zsuzsa,

    Sorry for the delay. I am going to ask you to submit a bug report, but before I do, can you try giving the full path to the files? It helped a different user in this thread. Not sure if this will help, but it is worth a shot :smile:

    -Sheila

  • srw6vsrw6v United StatesMember

    Hi Sheila
    I'm having the same issue and it seems to be sporadic. I'm using full paths. It runs fine on smaller low memory machines but crashes on my cluster. Any updates would be awesome.

    Stephen

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @srw6v
    Hi Stephen,

    Thanks for letting me know. Can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

  • srw6vsrw6v United StatesMember

    Hi @Sheila and @gerzs,
    I figured out the issue (at least for me). It stems from where SplitNCigarReads is writing the temporary files. For me, it's writing them to the cluster which has very limited disk space. When I redirected this using --TMP_DIR /my/scratch/space everything went smoothly.

    The part that still confuses me is that I had already set export _JAVA_OPTIONS=-Djava.io.tmpdir=/my/scratch/space. This is not getting picked up by SplitNCigarReads in GATK4 as I would have expected. After much experimenting I started with a clean environment and simply set --TMP_DIR /my/scratch/space only which worked.

    This seems a bit "buggy" to me and it would be great if the GATK development team could look into it and pass Djava.io.tmpdir to --TMP_DIR if possible.

    Thanks,

    Stephen

    Issue · Github
    by Sheila

    Issue Number
    4487
    State
    open
    Last Updated
  • Hi @Sheila and @srw6v,
    thanks for the suggestions. I had some problems with the tmp directory when running GATK 3.8, but it didn't occur to me that this GATK4 error indicated the same problem. I will try using the full path and setting the --TMP_DIR options.

  • srw6vsrw6v United StatesMember

    Hi @gerzs,
    Using gatk 3.8 was the only reason I figure it out. the error message was more clear and gave me some indication rather an a generic samtools error.

    Stephen

  • srw6vsrw6v United StatesMember

    Also related, I had to increase the amount of RAM I was using but this might be specific to my dataset.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @srw6v
    Hi Stephen,

    Thanks for reporting your solution. I just asked the developer for some insight. I know the tmp dir had to be set in Picard separately, but I have not heard of this in GATK before.

    @gerzs Please let us know if this helps you.

    -Sheila

Sign In or Register to comment.