Dynamically sizing local disk for Array[File]

Hi,

Is there a way to apply the function size(File) to Array[File] to compute the total size for all the files in the array?

Thanks,
Seva

Best Answer

  • ChrisLChrisL Cambridge, MA ✭✭
    Accepted Answer

    Hi @skashin it doesn't right now without a lot of effort, so this seems like a really good suggestion!

    I've submitted a PR on your behalf to the WDL spec to add this: https://github.com/openwdl/wdl/pull/169 - please go ahead and comment or upvote it, I don't foresee it being particularly controversial but an endorsement never hurts!

    In the meantime, you can work around this, but I can only think of a really awkward way (I'd recommend importing this as a sub-workflow to hide the implementation details from your main WDL):

    workflow array_size {
      # your input:
      Array[File] files 
    
      # Use scatter to get the size of each file:
      scatter(f in files) { Int f_size = round(size(f)) }
    
      # Gather the results:
      Array[Int] f_sizes = f_size
    
      # Use a task to sum the array:
      call sum { input: ints = f_sizes }
    
      output { Int result = sum.sum }
    }
    
    task sum {
      Array[Int] ints
    
      command {
        echo $(( ${sep="+" ints} ))
      }
    
      output {
        Int sum = read_int(stdout())
      }
    }
    
    

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Moderator, Dev ✭✭
    Accepted Answer

    Hi @skashin it doesn't right now without a lot of effort, so this seems like a really good suggestion!

    I've submitted a PR on your behalf to the WDL spec to add this: https://github.com/openwdl/wdl/pull/169 - please go ahead and comment or upvote it, I don't foresee it being particularly controversial but an endorsement never hurts!

    In the meantime, you can work around this, but I can only think of a really awkward way (I'd recommend importing this as a sub-workflow to hide the implementation details from your main WDL):

    workflow array_size {
      # your input:
      Array[File] files 
    
      # Use scatter to get the size of each file:
      scatter(f in files) { Int f_size = round(size(f)) }
    
      # Gather the results:
      Array[Int] f_sizes = f_size
    
      # Use a task to sum the array:
      call sum { input: ints = f_sizes }
    
      output { Int result = sum.sum }
    }
    
    task sum {
      Array[Int] ints
    
      command {
        echo $(( ${sep="+" ints} ))
      }
    
      output {
        Int sum = read_int(stdout())
      }
    }
    
    
  • skashinskashin Member ✭✭

    Hi @ChrisL.
    Thanks for a prompt response and for creating a PR! (I commented on its usefulness on github)
    And great suggestion for a work-around, I will give it a try.

Sign In or Register to comment.