WDL pipeline can't parse write_lines(Array[File]) on Google Cloud

cobalt137cccobalt137cc Saint Louis, MOMember

I'm running a custom Cromwell WDL workflow via wdl_runner and Google Genomics as described here: https://cloud.google.com/genomics/v1alpha2/gatk.

task L_Sort_VCF_Variants {
  Array[File] input_vcfs
  Int disk_size
  Int preemptible_tries

  command {
    ls ${write_lines(input_vcfs)}
    cat ${write_lines(input_vcfs)}
  }

  runtime {
    docker: "cc2qe/svtools:v1"
    cpu: "1"
    memory: "3.75 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
  }

  output {
    String status = read_string(stdout())
  }
}

The task fails when parsing the ${write_lines(array)} variable, throwing the following error. This occurs on both Cromwell v0.24 and Cromwell v0.25. The WDL file and the inputs file are attached below

ls: cannot access /cromwell_root/ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/360711f9-214a-4dbb-960b-f88649bf324f/call-L_Sort_VCF_Variants/write_lines_ec01a3c74ddbd7adf99118aa7ce649dd.tmp: No such file or directory
cat: /cromwell_root/ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/360711f9-214a-4dbb-960b-f88649bf324f/call-L_Sort_VCF_Variants/write_lines_ec01a3c74ddbd7adf99118aa7ce649dd.tmp: No such file or directory

It seems that the instance is indeed creating a file from the array with "gs://" bucket prefixes, and then appropriately converts the file paths to local mappings. But the locally mapped temp file (write_lines_ec01a3c74ddbd7adf99118aa7ce649dd.tmp) is never synched to the instance, thus the run fails.

write_lines_c40e77970ba532ef852472f947146a1a.tmp (original bucket mappings):

gs://ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/34c85eb9-55df-4823-8bbc-7a1a244eac26/call-SV_Genotype_Unmerged/shard-0/H_IJ-NA12878-NA12878_K10.gt.vcf
gs://ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/34c85eb9-55df-4823-8bbc-7a1a244eac26/call-SV_Genotype_Unmerged/shard-1/H_IJ-NA12891-NA12891_D2.gt.vcf

write_lines_ec01a3c74ddbd7adf99118aa7ce649dd.tmp (local mappings):

/cromwell_root/ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/34c85eb9-55df-4823-8bbc-7a1a244eac26/call-SV_Genotype_Unmerged/shard-0/H_IJ-NA12878-NA12878_K10.gt.vcf
/cromwell_root/ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/34c85eb9-55df-4823-8bbc-7a1a244eac26/call-SV_Genotype_Unmerged/shard-1/H_IJ-NA12891-NA12891_D2.gt.vcf

Best Answer

Answers

  • cobalt137cccobalt137cc Saint Louis, MOMember

    @Geraldine_VdAuwera said:
    Have a look at this discussion. It seems our usage example in the documentation is inaccurate; the comment I linked has a different syntax that seems to work (now that the compounding bug in Cromwell mentioned in that thread is fixed in C25):

    Thanks Geraldine, that does indeed work. Although, as @dheiman noted, the file contains the Google bucket paths rather than the local paths. However this is a minor issue as I can simply do a find/replace on that file.

    Thanks for the quick and helpful response!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    You're welcome, that's what we're here for :)

Sign In or Register to comment.