Parameters in workflow input json not being passed to workflow WDL?

amr@broadinstitute.orge[email protected] Member, Broadie

Hi,

I'm trying to get a gatk WDL put together and can't figure out what I'm doing wrong here. Here's the error:

  "failures": [
    {
      "causedBy": [
       {
          "causedBy": [],
          "message": "Required workflow input gatk.samples_file not specified."
       },
        {
          "causedBy": [],
          "message": "Required workflow input gatk.reference not specified."
        }
      ],

here is my workflow input:

{
  "samples_file": "/cil/shed/sandboxes/amr/dev/gatk_pipeline/data/small/input.tsv",
  "picard_path": "/cil/shed/apps/external/picard/current/bin/picard.jar",
  "reference": "/cil/shed/sandboxes/amr/dev/gatk_pipeline/data/small/CneoH99_supercont2.1_200k_300k.fa",
  "output_dir": "/cil/shed/sandboxes/amr/dev/gatk_pipeline/output/small_test",
  "overwrite": "true"
}

and here is my partial WDL:

# GATK WDL V0
# Subset 1

task IndexReference {
    File reference # from JSON
    String picard_path # from JSON
    String out_file

    command {
    use BWA
    bwa index ${reference}
    java -jar ${picard_path} CreateSequenceDictionary REFERENCE=${reference} O=${out_file}
    use Samtools
    samtools faidx ${reference}
    }
}

task MakeDirs {
    String output_dir # from JSON
    String sample_name # from csv

    command {
        mkdir -p ${output_dir}/${sample_name}
    }
    output {
        String sample_dir = "${output_dir}/${sample_name}/"
    }
}

task SamToFastq {
    String picard_path # from JSON
    String in_bam # from TSV
    String fq1 = sub(in_bam, "\\.bam$", ".1.fastq") # first end fastq
    String fq2 = sub(in_bam, "\\.bam$", ".2.fastq") # second end fastq

    command {
        java -Xmx12G -jar ${picard_path} SamToFastq INPUT=${in_bam} FASTQ=${fq1} SECOND_END_FASTQ=${fq2}
    }
    output {
        Array[String] fq_set = ["${fq1}", "${fq2}"]
    }
}

task AlignBAM {
    String reference
    String sample_dir
    String sample_name

    command {
        use BWA
        bwa mem -t 8 ${reference} SamtToFastq.fq_set[0] SamToFastq.fq_set[1] > ${sample_dir}${sample_name}.sam # removed read group, can add back later
    }
}

workflow gatk {
    File samples_file
    File reference

    call IndexReference {
        input: reference=reference,
        picard_path=picard_path,
        out_file = sub(reference, "\\.fa*$", ".dict")
    }
    # This scatter block
    scatter(sample in read_tsv(samples_file)) {

        call MakeDirs {
            input: output_dir = output_dir,
            sample_name = sample[0]
        }
        call SamToFastq {
            input: picard_path = picard_path,
            in_bam = sample[1],
        }

        call AlignBAM {
            input: reference = reference,
            sample_dir = MakeDirs.sample_dir,
            sample_name = sample[0]
        }
    }
}

I thought using the variable 'samples_file' as written in my input JSON, WDL would know to use the variable as specified in the input JSON but that seems not to be the case. I thought that having input: in the calls would indicate that WDL should get the input from the json but I'm missing something here.

Best Answer

Answers

Sign In or Register to comment.