This section of the forum is no longer actively monitored. We are working on a support migration plan that we will share here shortly. Apologies for this inconvenience.
Very slow download of Files from Google Cloud Storage
We're using Cromwell with Google Cloud Storage and Google's Pipeline API and have observed that transferring files to GCS once a task outputs it's files is extremely fast (~13 seconds for 978 files). By contrast, transferring the files to a new task (and it's associated new VM) is extremely slow - about 532 seconds, which appears due to the way Cromwell copies files from GCS (issuing a single gsutil cp command for each and every file).
An example copy command of a single file:
sudo gsutil -q -m cp gs://test-bucket/wdl_runner/work/cs/16738fac-5146-4a3c-9cfa-d5ded7f199fc/call-demultiplex_and_sample_prep/glob-9c1244b6ebf22abec57cd494340f8c79/CL101_invASISTR_segment_0.fasta /mnt/local-disk/test-bucket/wdl_runner/work/cs/16738fac-5146-4a3c-9cfa-d5ded7f199fc/call-demultiplex_and_sample_prep/glob-9c1244b6ebf22abec57cd494340f8c79/CL101_invASISTR_segment_0.fasta
The -m for performing a multi-threaded copy is enabled, which is great, but has no effect since the command is only copying a single file. Is there any way to change the copy command so that it can download an entire bucket? Or some other way to make the file transfer more efficient?