Allow determining gsutil cp order using a parameter tag

ericco92ericco92 Cambridge, UKMember ✭✭

When working with indexed files, it is quite common to have the program reading an index perform a quick check to make sure the index is newer than the file it indexes.

FireCloud uses gsutil cp to copy files back to the data store, but the order in which files are copied is not defined. Because gsutil writes the metadata about file creation time based on when the gsutil cp finishes at the destination, it's pretty common for me to end up with indexes that appear older than their files.

It would be nice to have a queuing or management system to determine the order output files get copied back. I'm sure this prevents the use of the -m gsutil flag for parallel copy, but I for one would be willing to pay the additional time penalty.

I can imagine this as a metadata parameter, or perhaps an outputs attribute (e.g. an ordered list: order : ["${output}", "${outputIndex}"]). It might also make sense to simply copy them back in the order they're created.

1 votes

Active · Last Updated

Sign In or Register to comment.