To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

WDL pipeline can't parse write_lines(Array[File]) on Google Cloud

cobalt137cccobalt137cc Saint Louis, MOMember

I'm running a custom Cromwell WDL workflow via wdl_runner and Google Genomics as described here: https://cloud.google.com/genomics/v1alpha2/gatk.

task L_Sort_VCF_Variants {
  Array[File] input_vcfs
  Int disk_size
  Int preemptible_tries

  command {
    ls ${write_lines(input_vcfs)}
    cat ${write_lines(input_vcfs)}
  }

  runtime {
    docker: "cc2qe/svtools:v1"
    cpu: "1"
    memory: "3.75 GB"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptible_tries
  }

  output {
    String status = read_string(stdout())
  }
}

The task fails when parsing the ${write_lines(array)} variable, throwing the following error. This occurs on both Cromwell v0.24 and Cromwell v0.25. The WDL file and the inputs file are attached below

ls: cannot access /cromwell_root/ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/360711f9-214a-4dbb-960b-f88649bf324f/call-L_Sort_VCF_Variants/write_lines_ec01a3c74ddbd7adf99118aa7ce649dd.tmp: No such file or directory
cat: /cromwell_root/ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/360711f9-214a-4dbb-960b-f88649bf324f/call-L_Sort_VCF_Variants/write_lines_ec01a3c74ddbd7adf99118aa7ce649dd.tmp: No such file or directory

It seems that the instance is indeed creating a file from the array with "gs://" bucket prefixes, and then appropriately converts the file paths to local mappings. But the locally mapped temp file (write_lines_ec01a3c74ddbd7adf99118aa7ce649dd.tmp) is never synched to the instance, thus the run fails.

write_lines_c40e77970ba532ef852472f947146a1a.tmp (original bucket mappings):

gs://ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/34c85eb9-55df-4823-8bbc-7a1a244eac26/call-SV_Genotype_Unmerged/shard-0/H_IJ-NA12878-NA12878_K10.gt.vcf
gs://ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/34c85eb9-55df-4823-8bbc-7a1a244eac26/call-SV_Genotype_Unmerged/shard-1/H_IJ-NA12891-NA12891_D2.gt.vcf

write_lines_ec01a3c74ddbd7adf99118aa7ce649dd.tmp (local mappings):

/cromwell_root/ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/34c85eb9-55df-4823-8bbc-7a1a244eac26/call-SV_Genotype_Unmerged/shard-0/H_IJ-NA12878-NA12878_K10.gt.vcf
/cromwell_root/ccdg-100-samples-trios-pilot-crams-mgi/workspace/SV_Detect/34c85eb9-55df-4823-8bbc-7a1a244eac26/call-SV_Genotype_Unmerged/shard-1/H_IJ-NA12891-NA12891_D2.gt.vcf

Best Answer

Answers

  • cobalt137cccobalt137cc Saint Louis, MOMember

    @Geraldine_VdAuwera said:
    Have a look at this discussion. It seems our usage example in the documentation is inaccurate; the comment I linked has a different syntax that seems to work (now that the compounding bug in Cromwell mentioned in that thread is fixed in C25):

    Thanks Geraldine, that does indeed work. Although, as @dheiman noted, the file contains the Google bucket paths rather than the local paths. However this is a minor issue as I can simply do a find/replace on that file.

    Thanks for the quick and helpful response!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    You're welcome, that's what we're here for :)

Sign In or Register to comment.