To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

How can I reference the outputs of a task in an if block?

gauthiergauthier Member, Broadie, Moderator, Dev

I'm trying to write a workflow along the lines of X->Y->Z where X and Y are in a scatter and Y is conditional on a workflow input. Z expects an Array[File] but since Y is in an if block it's output is giving me a Array[File?] and I'm getting a coercion error. What's the right way to do this?

Specifically, my error is No coercion defined from [Yshard1, Yshard2, Yshard3...] of type 'Array[File?]' to 'Array[File]'."

The relevant part of the WDL looks like this:

# Call variants in parallel over WGS calling intervals
  scatter (index in range(ScatterIntervalList.interval_count)) {
    # Generate GVCF by interval
    call HaplotypeCaller {
      input:
        contamination = CheckContamination.contamination,
        input_bam = GatherBamFiles.output_bam,
        interval_list = ScatterIntervalList.out[index],
        gvcf_basename = base_file_name,
        genotype_and_filter = genotype_and_filter,
        ref_dict = ref_dict,
        ref_fasta = ref_fasta,
        ref_fasta_index = ref_fasta_index,
        # Divide the total output GVCF size and the input bam size to account for the smaller scattered input and output.
        disk_size = ((binned_qual_bam_size + GVCF_disk_size) / hc_divisor) + ref_size + additional_disk,
        preemptible_tries = agg_preemptible_tries
     }
    if (do_filtering) {
      call FilterVcf {
        input:
          input_vcf = HaplotypeCaller.output_gvcf,
          input_vcf_index = HaplotypeCaller.output_gvcf_index,
          gvcf_basename = base_file_name,
          interval_list = ScatterIntervalList.out[index],
          gvcf_basename = base_file_name,
          # The output here should be the same size
          disk_size = ((binned_qual_bam_size + GVCF_disk_size) / hc_divisor) + ref_size + additional_disk,
          preemptible_tries = preemptible_tries
      }
    }
  }

  Array[File] merge_input = select_first([FilterVcf.output_vcf, HaplotypeCaller.output_gvcf])
  Array[File] merge_input_index = select_first([FilterVcf.output_vcf_index, HaplotypeCaller.output_gvcf_index])
  String name_token = if do_filtering then ".filtered" else ".g"

  # Combine by-interval GVCFs into a single sample GVCF file
  call MergeVCFs {
    input:
      input_vcfs = merge_input,
      input_vcfs_indexes = merge_input_index,
      output_vcf_name = final_gvcf_base_name + name_token + ".vcf.gz",
      disk_size = GVCF_disk_size,
      preemptible_tries = agg_preemptible_tries
  }

Answers

  • ChrisLChrisL Cambridge, MAMember, Broadie, Dev
    edited January 25

    The way to go from Array[X?] to Array[X] is using the function select_all(), which picks out only the values in the array of optionals that are set.

    In your example, I'd guess this would go in somewhere like:

    Array[File] merge_input = select_first([select_all(FilterVcf.output_vcf), HaplotypeCaller.output_gvcf])
    Array[File] merge_input_index = select_first([select_all(FilterVcf.output_vcf_index), HaplotypeCaller.output_gvcf_index])
    

    EDIT: I think I misread your WDL.

    It looks like you want something like this (I simplified the names, hopefully it's clear what they map back to):

    scatter {
      call X # produces File X.f
      if {
        call Y # produces File Y.f
      }
      File f = select_first(Y,f, X,f)
    }
    # Gather the Files in the normal way:
    Array[File] merge_input = f
    
    
  • RuchiRuchi Member, Broadie, Dev

    @gauthier I believe one way to get what you need is to try this:

    # Call variants in parallel over WGS calling intervals
      scatter (index in range(ScatterIntervalList.interval_count)) {
        # Generate GVCF by interval
        call HaplotypeCaller {
          input:
            contamination = CheckContamination.contamination,
            input_bam = GatherBamFiles.output_bam,
            interval_list = ScatterIntervalList.out[index],
            gvcf_basename = base_file_name,
            genotype_and_filter = genotype_and_filter,
            ref_dict = ref_dict,
            ref_fasta = ref_fasta,
            ref_fasta_index = ref_fasta_index,
            # Divide the total output GVCF size and the input bam size to account for the smaller scattered input and output.
            disk_size = ((binned_qual_bam_size + GVCF_disk_size) / hc_divisor) + ref_size + additional_disk,
            preemptible_tries = agg_preemptible_tries
         }
        if (do_filtering) {
          call FilterVcf {
            input:
              input_vcf = HaplotypeCaller.output_gvcf,
              input_vcf_index = HaplotypeCaller.output_gvcf_index,
              gvcf_basename = base_file_name,
              interval_list = ScatterIntervalList.out[index],
              gvcf_basename = base_file_name,
              # The output here should be the same size
              disk_size = ((binned_qual_bam_size + GVCF_disk_size) / hc_divisor) + ref_size + additional_disk,
              preemptible_tries = preemptible_tries
          }
        }
    
        File final_vcf = select_first([FilterVcf.output_vcf, HaplotypeCaller.output_gvcf])
        File final_vcf_idx = select_first([FilterVcf.output_vcf_index, HaplotypeCaller.output_gvcf_index])
      }
    
      Array[File] merge_input = final_vcf
      Array[File] merge_input_index = final_vcf_index
      String name_token = if do_filtering then ".filtered" else ".g"
    
      # Combine by-interval GVCFs into a single sample GVCF file
      call MergeVCFs {
        input:
          input_vcfs = merge_input,
          input_vcfs_indexes = merge_input_index,
          output_vcf_name = final_gvcf_base_name + name_token + ".vcf.gz",
          disk_size = GVCF_disk_size,
          preemptible_tries = agg_preemptible_tries
      }
    
    
Sign In or Register to comment.