Update: July 26, 2019
This section of the forum is no longer actively monitored. We are working on a support migration plan that we will share here shortly. Apologies for this inconvenience.

Unable to run wdl script on GoogleCloud, Job status from Cromwell was not 'Submitted' instead 'fail'

Hi,

I am trying to run wdl scripts on Google cloud. I started running a simple script (from the "seq-format-conversion" workflow) and I was able to successfully convert fastq files to uBAMs. However, I am not being able to run a slightly more complex wdl script: a shortened version of fc_germline_single_sample_workflow.wdl from the five-dollar-genome-analysis-pipeline workflow, which I am trying with the samples provided for trials (the ones in "gs://broad-public-datasets/NA12878_downsampled_for_testing/unmapped/NA12878.ubams.list" fofn file). I am planning to adapt this script for WES analysis (scattering bqsr by sets of exome intervals instead of using the whole genome TSV, among other modifications) and this shortened script will be the base on which I will build up the rest (also I thought I could save on my GCP free credits spending by using the shortened script in trials).
I did not modify anything on the wdl script, just reduced it to "getBwaVersion" and "SamToFastqAndBwaMemAndMba" tasks. Also, I arranged the wdl script in a single file to avoid having sub-workflows on independent files, which seems troublesome according to some posts here. I checked the syntax of the shortened script with wdltool and it looks fine.

This is the error that I get when running with the default wdl_runner file (the "wdl_pipeline.yaml" that uses the docker "gcr.io/broad-dsde-outreach/wdl_runner:2017_10_02"):

2019-02-06 07:45:21,834 sys_util INFO: CROMWELL->/cromwell/cromwell.jar

2019-02-06 07:45:21,834 sys_util INFO: CROMWELL_CONF->/cromwell/jes_template.conf

2019-02-06 07:45:21,840 discovery INFO: URL being requested: GET https://www.googleapis.com/discovery/v1/apis/storage/v1/rest

2019-02-06 07:45:21,878 discovery INFO: URL being requested: GET https://www.googleapis.com/storage/v1/b/mack812-prueba7/o?fields=nextPageToken%2Citems%28name%29&prefix=out%2Foutputs&alt=json&maxResults=2

2019-02-06 07:45:21,878 transport INFO: Attempting refresh to obtain initial access_token

2019-02-06 07:45:21,976 cromwell_driver INFO: Started Cromwell

2019-02-06 07:45:21,977 wdl_runner INFO: starting

2019-02-06 07:45:27,003 cromwell_driver INFO: Failed to connect to Cromwell (attempt 1): ('Connection aborted.', error(99, 'Cannot assign requested address'))

2019-02-06 07:45:32,007 cromwell_driver INFO: Failed to connect to Cromwell (attempt 2): ('Connection aborted.', error(99, 'Cannot assign requested address'))

ERROR: Job status from Cromwell was not 'Submitted', instead 'fail'

I have read several posts on this forum linking this error to a too small VM being created for Cromwell launching, so I am using the "-memory 5" argument. This is the command that I am using to start the run:

gcloud alpha genomics pipelines run \
  --pipeline-file wdl_pipeline.yaml \
  --zones us-central1-c \
  --memory 5 \
  --inputs-from-file WDL="${GATK_GOOGLE_DIR}/align_only_fc_hg38.wdl" \
  --inputs-from-file WORKFLOW_INPUTS="${GATK_GOOGLE_DIR}/align_only_fc_hg38_inputs.json" \
  --inputs-from-file WORKFLOW_OPTIONS="${GATK_GOOGLE_DIR}/generic.google-papi.options.json" \
  --inputs WORKSPACE="${GATK_OUTPUT_DIR}/workspace" \
  --inputs OUTPUTS="${GATK_OUTPUT_DIR}/outputs" \
  --logging "${GATK_OUTPUT_DIR}/logging"

I have also tried modifying the "wdl_pipeline.yaml" by increasing the value of the "resources: minimumRamGb:" from 3.75 to 5 to get a larger starting VM but I get the same error.

I have also tried using different dockers: "gcr.io/broad-dsde-outreach/wdl_runner:2017_10_06-large_files" or "gcr.io/broad-dsde-outreach/wdl_runner:2018_11_28", by modifying the "docker: imageName:" line in "wdl_pipeline.yaml" but this does not solve the situation either. When trying the latest version of the docker ("wdl_runner:2018_11_28") I get a different error though, a "cp: cannot stat (my-whole-wdl-script) File name too long". From the stderr-log:

cp: cannot stat ‘# This is a section of the fc_germline_single_sample_workflow.wdl, up to Alignment and Bam merging\n\nworkflow germline_single_sample_workflow {\n\n  File flowcell_unmapped_bams_fofn\n  Array[File] flowcell_unmapped_bams = read_lines(flowcell_unmapped_bams_fofn)\n\n  # File contamination_sites_ud\n  # File contamination_sites_bed\n  # File contamination_sites_mu\n  # File wgs_evaluation_interval_list\n  # File wgs_coverage_interval_list\n\n  # String sample_name\n  # String base_file_name\n  # String final_vcf_base_name\n  String unmapped_bam_suffix\n\n  # File wgs_calling_interval_list\n  # Int haplotype_scatter_count\n  # Int break_bands_at_multiples_of\n  # Int read_length = 250\n\n  File ref_fasta\n  File ref_fasta_index\n  File ref_dict\n  File ref_alt\n  File ref_bwt\n  File ref_sa\n  File ref_amb\n  File ref_ann\n  File ref_pac\n\n  # File dbSNP_vcf\n  # File dbSNP_vcf_index\n  # Array[File] known_indels_sites_VCFs\n  # Array[File] known_indels_sites_indices\n\n  Int preemptible_tries\n  Int agg_preemptible_tries\n\n  # Boolean skip_QC\n  # Boolean make_gatk4_single_sample_vcf\n  # Boolean use_gatk4_haplotype_caller\n\n  # Float cutoff_for_large_rg_in_gb = 20.0\n\n  String bwa_commandline="bwa mem -K 100000000 -p -v 3 -t 8 -Y $bash_ref_fasta"\n\n  # String recalibrated_bam_basename = base_file_name + ".aligned.duplicates_marked.recalibrated"\n\n  Int compression_level = 2\n\n  # Get the version of BWA to include in the PG record in the header of the BAM produced\n  # by MergeBamAlignment.\n  call GetBwaVersion\n\n  # Align flowcell-level unmapped input bams in parallel\n  scatter (unmapped_bam in flowcell_unmapped_bams) {\n\n    Float unmapped_bam_size = size(unmapped_bam, "GB")\n\n    String unmapped_bam_basename = basename(unmapped_bam, unmapped_bam_suffix)\n\n    # if (!skip_QC) {\n      # QC the unmapped BAM\n    #  call CollectQualityYieldMetrics {\n    #    input:\n    #      input_bam = unmapped_bam,\n    #      metrics_filename = unmapped_bam_basename + ".unmapped.quality_yield_metrics",\n    #      preemptible_tries = preemptible_tries\n    #  }\n    # }\n\n    # if (unmapped_bam_size > cutoff_for_large_rg_in_gb) {\n      # Split bam into multiple smaller bams,\n      # map reads to reference and recombine into one bam\n    #  call SplitRG {\n    #    input:\n    #      input_bam = unmapped_bam,\n    #      bwa_commandline = bwa_commandline,\n    #      bwa_version = GetBwaVersion.version,\n    #      output_bam_basename = unmapped_bam_basename + ".aligned.unsorted",\n    #      ref_fasta = ref_fasta,\n    #      ref_fasta_index = ref_fasta_index,\n    #      ref_dict = ref_dict,\n    #      ref_alt = ref_alt,\n    #      ref_amb = ref_amb,\n    #      ref_ann = ref_ann,\n    #      ref_bwt = ref_bwt,\n    #      ref_pac = ref_pac,\n    #      ref_sa = ref_sa,\n    #      compression_level = compression_level,\n    #      preemptible_tries = preemptible_tries\n    #  }\n    #}\n\n    #if (unmapped_bam_size <= cutoff_for_large_rg_in_gb) {\n    \n    # Map reads to reference\n    call SamToFastqAndBwaMemAndMba {\n      input:\n        input_bam = unmapped_bam,\n        bwa_commandline = bwa_commandline,\n        output_bam_basename = unmapped_bam_basename + ".aligned.unsorted",\n        ref_fasta = ref_fasta,\n        ref_fasta_index = ref_fasta_index,\n        ref_dict = ref_dict,\n        ref_alt = ref_alt,\n        ref_bwt = ref_bwt,\n        ref_amb = ref_amb,\n        ref_ann = ref_ann,\n        ref_pac = ref_pac,\n        ref_sa = ref_sa,\n        bwa_version = GetBwaVersion.version,\n        compression_level = compression_level,\n        preemptible_tries = preemptible_tries\n    }\n  }\n  # Outputs that will be retained when execution is complete\n  output {\n\n  Array[File] output_bam = SamToFastqAndBwaMemAndMba.output_bam\n\n    #Float mapped_bam_size = size(output_aligned_bam, "GB")\n  }\n}\n\n# Get version of BWA\ntask GetBwaVersion {\n  command {\n    # not setting set -o pipefail here because /bwa has a rc=1 and we dont want to allow rc=1 to succeed because\n    # the sed may also fail with that error and that is something we actually want to fail on.\n    /usr/gitc/bwa 2>&1 | \\\n    grep -e '^Version' | \\\n    sed 's/Version: //'\n  }\n  runtime {\n    docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135"\n    memory: "1 GB"\n  }\n  output {\n    String version = read_string(stdout())\n  }\n}\n\n# Read unmapped BAM, convert on-the-fly to FASTQ and stream to BWA MEM for alignment, then stream to MergeBamAlignment\ntask SamToFastqAndBwaMemAndMba {\n  File input_bam\n  String bwa_commandline\n  String bwa_version\n  String output_bam_basename\n  File ref_fasta\n  File ref_fasta_index\n  File ref_dict\n\n  # This is the .alt file from bwa-kit (https://github.com/lh3/bwa/tree/master/bwakit),\n  # listing the reference contigs that are "alternative".\n  File ref_alt\n\n  File ref_amb\n  File ref_ann\n  File ref_bwt\n  File ref_pac\n  File ref_sa\n  Int compression_level\n  Int preemptible_tries\n\n  Float unmapped_bam_size = size(input_bam, "GB")\n  Float ref_size = size(ref_fasta, "GB") + size(ref_fasta_index, "GB") + size(ref_dict, "GB")\n  Float bwa_ref_size = ref_size + size(ref_alt, "GB") + size(ref_amb, "GB") + size(ref_ann, "GB") + size(ref_bwt, "GB") + size(ref_pac, "GB") + size(ref_sa, "GB")\n  # Sometimes the output is larger than the input, or a task can spill to disk.\n  # In these cases we need to account for the input (1) and the output (1.5) or the input(1), the output(1), and spillage (.5).\n  Float disk_multiplier = 2.5\n  Int disk_size = ceil(unmapped_bam_size + bwa_ref_size + (disk_multiplier * unmapped_bam_size) + 20)\n\n  command <<<\n    set -o pipefail\n    set -e\n\n    # set the bash variable needed for the command-line\n    bash_ref_fasta=${ref_fasta}\n    # if ref_alt has data in it,\n    if [ -s ${ref_alt} ]; then\n      java -Xms5000m -jar /usr/gitc/picard.jar \\\n        SamToFastq \\\n        INPUT=${input_bam} \\\n        FASTQ=/dev/stdout \\\n        INTERLEAVE=true \\\n        NON_PF=true | \\\n      /usr/gitc/${bwa_commandline} /dev/stdin - 2> >(tee ${output_bam_basename}.bwa.stderr.log >&2) | \\\n      java -Dsamjdk.compression_level=${compression_level} -Xms3000m -jar /usr/gitc/picard.jar \\\n        MergeBamAlignment \\\n        VALIDATION_STRINGENCY=SILENT \\\n        EXPECTED_ORIENTATIONS=FR \\\n        ATTRIBUTES_TO_RETAIN=X0 \\\n        ATTRIBUTES_TO_REMOVE=NM \\\n        ATTRIBUTES_TO_REMOVE=MD \\\n        ALIGNED_BAM=/dev/stdin \\\n        UNMAPPED_BAM=${input_bam} \\\n        OUTPUT=${output_bam_basename}.bam \\\n        REFERENCE_SEQUENCE=${ref_fasta} \\\n        PAIRED_RUN=true \\\n        SORT_ORDER="unsorted" \\\n        IS_BISULFITE_SEQUENCE=false \\\n        ALIGNED_READS_ONLY=false \\\n        CLIP_ADAPTERS=false \\\n        MAX_RECORDS_IN_RAM=2000000 \\\n        ADD_MATE_CIGAR=true \\\n        MAX_INSERTIONS_OR_DELETIONS=-1 \\\n        PRIMARY_ALIGNMENT_STRATEGY=MostDistant \\\n        PROGRAM_RECORD_ID="bwamem" \\\n        PROGRAM_GROUP_VERSION="${bwa_version}" \\\n        PROGRAM_GROUP_COMMAND_LINE="${bwa_commandline}" \\\n        PROGRAM_GROUP_NAME="bwamem" \\\n        UNMAPPED_READ_STRATEGY=COPY_TO_TAG \\\n        ALIGNER_PROPER_PAIR_FLAGS=true \\\n        UNMAP_CONTAMINANT_READS=true \\\n        ADD_PG_TAG_TO_READS=false\n\n      grep -m1 "read .* ALT contigs" ${output_bam_basename}.bwa.stderr.log | \\\n      grep -v "read 0 ALT contigs"\n\n    # else ref_alt is empty or could not be found\n    else\n      exit 1;\n    fi\n  >>>\n  runtime {\n    docker: "us.gcr.io/broad-gotc-prod/genomes-in-the-cloud:2.3.2-1510681135"\n    preemptible: preemptible_tries\n    memory: "8 GB"\n    cpu: "8"\n    disks: "local-disk " + disk_size + " HDD"\n  }\n  output {\n    File output_bam = "${output_bam_basename}.bam"\n    File bwa_stderr_log = "${output_bam_basename}.bwa.stderr.log"\n  }\n}’: File name too long

Since according to previous posts the error seems to arise from Cromwell not finding enough resources to build the VMs, I have thought that it might have something to do with the quotas of the GCP project that I a using. In case this is important these are my current quotas for this project in UScentral1 region: 300 CPUs, 500 preemtible CPUs, 10.5 TB Persistent Disk Standard, 51 In-use IP Addresses.

This is the json file that I am using:

{
  "germline_single_sample_workflow.flowcell_unmapped_bams_fofn": "gs://broad-public-datasets/NA12878_downsampled_for_testing/unmapped/NA12878.ubams.list",
  "germline_single_sample_workflow.unmapped_bam_suffix": ".bam",

  "germline_single_sample_workflow.ref_dict": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.dict",
  "germline_single_sample_workflow.ref_fasta": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta",
  "germline_single_sample_workflow.ref_fasta_index": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
  "germline_single_sample_workflow.ref_alt": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt",
  "germline_single_sample_workflow.ref_sa": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa",
  "germline_single_sample_workflow.ref_amb": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb",
  "germline_single_sample_workflow.ref_bwt": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt",
  "germline_single_sample_workflow.ref_ann": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann",
  "germline_single_sample_workflow.ref_pac": "gs://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac",

  "germline_single_sample_workflow.agg_preemptible_tries": 3,
  "germline_single_sample_workflow.preemptible_tries": 3,
}

I have also tried using the gatk and gitc dockers that are hosted at docker-hub instead of the ones at google cloud ("us.gcr.io/broad-gotc-prod"), by hard-scripting them in the runtime section of the tasks, but the error message stays the same.

I am attaching the wdl script.

Sorry if I am missing something really obvious here, I am new to cromwell, wdl and GCP... Thank you in advance for your help :)

Answers

  • mack812mack812 SpainMember

    Hi again,

    I found that the error is described much better if I use the wdl runner from this repo:
    https://github.com/broadinstitute/wdl-runner
    (previously I was using the one in "https://github.com/broadinstitute/wdl" as indicated in the wdl-runner tutorial)

    Running exactly the same code as before with this newer version of the wdl_runner I get the following information when monitoring with the command "gcloud alpha genomics operations describe":

    error:
      code: 9
      message: 'Execution failed: action 1: unexpected exit status 1 was not ignored'
    metadata:
      events:
      - description: Worker released
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.WorkerReleasedEvent
          instance: google-pipelines-worker-4cf96499972381b16e14162338146333
          zone: us-central1-c
        timestamp: '2019-02-06T11:22:49.695813Z'
      - description: 'Execution failed: action 1: unexpected exit status 1 was not ignored'
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.FailedEvent
          cause: 'Execution failed: action 1: unexpected exit status 1 was not ignored'
          code: FAILED_PRECONDITION
        timestamp: '2019-02-06T11:22:48.022105Z'
      - description: Stopped running "/bin/sh -c gsutil -m -q cp /google/logs/output gs://mack812-prueba9/out/logging"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.ContainerStoppedEvent
          actionId: 8
          exitStatus: 0
          stderr: ''
        timestamp: '2019-02-06T11:22:47.905215Z'
      - description: Started running "/bin/sh -c gsutil -m -q cp /google/logs/output gs://mack812-prueba9/out/logging"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.ContainerStartedEvent
          actionId: 8
          ipAddress: ''
          portMappings: {}
        timestamp: '2019-02-06T11:22:46.533759Z'
      - description: Unexpected exit status 1 while running "/bin/sh -c gsutil -m -q cp
          gs://mack812-prueba9/out/workspace ${WORKSPACE}"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.UnexpectedExitStatusEvent
          actionId: 1
          exitStatus: 1
        timestamp: '2019-02-06T11:22:46.043935Z'
      - description: |-
          Stopped running "/bin/sh -c gsutil -m -q cp gs://mack812-prueba9/out/workspace ${WORKSPACE}": exit status 1: CommandException: No URLs matched: gs://mack812-prueba9/out/workspace
          CommandException: 1 file/object could not be transferred.
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.ContainerStoppedEvent
          actionId: 1
          exitStatus: 1
          stderr: |
            CommandException: No URLs matched: gs://mack812-prueba9/out/workspace
            CommandException: 1 file/object could not be transferred.
        timestamp: '2019-02-06T11:22:45.993807Z'
      - description: Started running "/bin/sh -c gsutil -m -q cp gs://mack812-prueba9/out/workspace
          ${WORKSPACE}"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.ContainerStartedEvent
          actionId: 1
          ipAddress: ''
          portMappings: {}
        timestamp: '2019-02-06T11:22:44.693663Z'
      - description: Stopped pulling "gcr.io/broad-dsde-outreach/wdl_runner:2018_11_28"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.PullStoppedEvent
          imageUri: gcr.io/broad-dsde-outreach/wdl_runner:2018_11_28
        timestamp: '2019-02-06T11:22:41.553135Z'
      - description: Started pulling "gcr.io/broad-dsde-outreach/wdl_runner:2018_11_28"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.PullStartedEvent
          imageUri: gcr.io/broad-dsde-outreach/wdl_runner:2018_11_28
        timestamp: '2019-02-06T11:22:01.869631Z'
      - description: Stopped pulling "gcr.io/cloud-genomics-pipelines/tools"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.PullStoppedEvent
          imageUri: gcr.io/cloud-genomics-pipelines/tools
        timestamp: '2019-02-06T11:22:01.804399Z'
      - description: Started pulling "gcr.io/cloud-genomics-pipelines/tools"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.PullStartedEvent
          imageUri: gcr.io/cloud-genomics-pipelines/tools
        timestamp: '2019-02-06T11:21:23.813167Z'
      - description: Stopped pulling "google/cloud-sdk:slim"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.PullStoppedEvent
          imageUri: google/cloud-sdk:slim
        timestamp: '2019-02-06T11:21:23.747311Z'
      - description: Started pulling "google/cloud-sdk:slim"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.PullStartedEvent
          imageUri: google/cloud-sdk:slim
        timestamp: '2019-02-06T11:21:04.251711Z'
      - description: Worker "google-pipelines-worker-4cf96499972381b16e14162338146333"
          assigned in "us-central1-c"
        details:
          '@type': type.googleapis.com/google.genomics.v2alpha1.WorkerAssignedEvent
          instance: google-pipelines-worker-4cf96499972381b16e14162338146333
          zone: us-central1-c
        timestamp: '2019-02-06T11:20:03.351729Z'
    

    The log generated in the output directory is:

    CommandException: No URLs matched: gs://mack812-prueba9/out/workspace
    CommandException: 1 file/object could not be transferred.
    

    Hope this helps

  • mack812mack812 SpainMember

    Sorry for all the hassle. I made it work. After reading the last error message it was clear to me that there was a problem trying to create the workspace directory, so I changed the running command to:

    gcloud alpha genomics pipelines run \
      --pipeline-file wdl_pipeline.yaml \
      --zones us-central1-c \
      --memory 5 \
      --inputs-from-file WDL="${GATK_GOOGLE_DIR}/align_only_fc_hg38.wdl" \
      --inputs-from-file WORKFLOW_INPUTS="${GATK_GOOGLE_DIR}/align_only_fc_hg38_inputs.json" \
      --inputs-from-file WORKFLOW_OPTIONS="${GATK_GOOGLE_DIR}/generic.google-papi.options.json" \
      --env-vars WORKSPACE="${GATK_OUTPUT_DIR}/workspace",\
    OUTPUTS="${GATK_OUTPUT_DIR}/output" \
      --logging "${GATK_OUTPUT_DIR}/logging"
    

    Therefore I changed the "--input" arg from the command in my first post to "--env-vars" for the workspace and output paths. The "--env-vars" did not work with the previous wdl-runner but works with this one.

    So, summing up, the command above works fine with the wdl-runner version in the following repo, which seems to be the latest version of the wdl-runner:
    https://github.com/broadinstitute/wdl-runner

    The aligned bams were successfully produced in the cloud

    Thanks

Sign In or Register to comment.