To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Dynamically sizing local disk for Array[File]

Hi,

Is there a way to apply the function size(File) to Array[File] to compute the total size for all the files in the array?

Thanks,
Seva

Best Answer

  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev
    Accepted Answer

    Hi @skashin it doesn't right now without a lot of effort, so this seems like a really good suggestion!

    I've submitted a PR on your behalf to the WDL spec to add this: https://github.com/openwdl/wdl/pull/169 - please go ahead and comment or upvote it, I don't foresee it being particularly controversial but an endorsement never hurts!

    In the meantime, you can work around this, but I can only think of a really awkward way (I'd recommend importing this as a sub-workflow to hide the implementation details from your main WDL):

    workflow array_size {
      # your input:
      Array[File] files 
    
      # Use scatter to get the size of each file:
      scatter(f in files) { Int f_size = round(size(f)) }
    
      # Gather the results:
      Array[Int] f_sizes = f_size
    
      # Use a task to sum the array:
      call sum { input: ints = f_sizes }
    
      output { Int result = sum.sum }
    }
    
    task sum {
      Array[Int] ints
    
      command {
        echo $(( ${sep="+" ints} ))
      }
    
      output {
        Int sum = read_int(stdout())
      }
    }
    
    

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev
    Accepted Answer

    Hi @skashin it doesn't right now without a lot of effort, so this seems like a really good suggestion!

    I've submitted a PR on your behalf to the WDL spec to add this: https://github.com/openwdl/wdl/pull/169 - please go ahead and comment or upvote it, I don't foresee it being particularly controversial but an endorsement never hurts!

    In the meantime, you can work around this, but I can only think of a really awkward way (I'd recommend importing this as a sub-workflow to hide the implementation details from your main WDL):

    workflow array_size {
      # your input:
      Array[File] files 
    
      # Use scatter to get the size of each file:
      scatter(f in files) { Int f_size = round(size(f)) }
    
      # Gather the results:
      Array[Int] f_sizes = f_size
    
      # Use a task to sum the array:
      call sum { input: ints = f_sizes }
    
      output { Int result = sum.sum }
    }
    
    task sum {
      Array[Int] ints
    
      command {
        echo $(( ${sep="+" ints} ))
      }
    
      output {
        Int sum = read_int(stdout())
      }
    }
    
    
  • Hi @ChrisL.
    Thanks for a prompt response and for creating a PR! (I commented on its usefulness on github)
    And great suggestion for a work-around, I will give it a try.

Sign In or Register to comment.