To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

read_tsv failed because the file was too big

Running cromwell 28.1 on Ubuntu 16.04.2, I encountered the following error:

Caused by: cromwell.backend.wdl.FileSizeTooBig: Use of WdlString(trimFastq.txt) failed because the file was too big (138170 bytes when only files of up to 128000 bytes are permissible

The file in question is of the format samplename forward_reads.fastq.gz reverse_reads.fastq.gz, and only contains 159 lines! After soe investigation, I discovered that because it gets generated by a task running in a sub workflow, cromwell expands the path to the file to include the full path to the current workflow, and then the sub (and sub) workflows, until each line in the files is over 800 characters long.

I think this limit should just be removed, since there is no way to discover this limitation without having your workflow crash, and computers that run NGS analysis are probably all able to hold files larger then 128kb in memory.

Tagged:

Answers

  • danbdanb Member, Broadie
    edited July 2017

    This setting is configurable in system->input-read-limits->tsv in your conf file.

    That being said, 128 kb does seem small as default. Also we should provide this configuration information in the exception.

Sign In or Register to comment.