mounting data somehow other than by copying?

bhaasbhaas Broad InstituteMember, Broadie

In one of my pipelines, the first step is downloading a 30G tar.gz file from a bucket followed by unpacking it. This takes a considerable amount of time. Is there a way to mount a read-only filesystem that has the data already unpacked so it doesn't have to be downloaded/unpacked for each workflow run?

Best Answers

Answers

  • bhaasbhaas Broad InstituteMember, Broadie
    Accepted Answer

    Thanks! I'll try that instead and see how it goes.

  • bhaasbhaas Broad InstituteMember, Broadie

    I'm looking at ~45G uncompressed. The computers I'm using are on the somewhat expensive side (50G RAM), so the cost of copying adds up across a large number of samples. If there's a quick mount command to a directory containing the resources, that would be phenomenal. Otherwise, I'll continue to copy things over.

  • dheimandheiman ✭✭ Member, Broadie ✭✭

    the copying is from google to google, so I don't believe there are egress charges, is 45G taking a significant amount of time to copy?

  • bhaasbhaas Broad InstituteMember, Broadie

    It might be fine. It's just that I'm going to be doing it ~15k separate times, and those numbers (and $$) start adding up quick.

  • bshifawbshifaw admin Member, Broadie, Moderator admin

    Hi @bhaas

    Happy that dheiman was able to provide you with an answer.
    Mounting a read-only filesystem is not currently possible. One option may be using streaming in the task by piping tools together (e.g. gsutil cat | tar -xvzf > output_file).

  • bhaasbhaas Broad InstituteMember, Broadie
    Accepted Answer

    Thanks @bshifaw . I'll stick w/ the uncompressed tar copy and see how it goes. Much appreciated!

Sign In or Register to comment.