Update: July 26, 2019
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.

Questions regarding "minimal WDL for joint genotyping"

Hi, Geraldine recommends us to ask the questions here. The questions below are regarding a WDL she shared with us, WGS_Joint_Analysis_160909.wdl. WGS_Joint_Analysis_160909.inputs.json.

  1. In the inputs.json file, I see that all input files are specified using full path names. The output files for each job, however, do not have a full path specified. For instance, “unzipped_basename” for task UnzipGVCF is just defined as “temp_unzipped”. Does each task instance would have an output directory unique to itself, assigned by job scheduler (e.g. Cromwell)?
  2. I see that JointAnalysis.scattered_calling_intervals has 50 intervals. That means the scatter calling GenotypeGVCFs would have 50 docker container launched, each handling one interval, and each GenotypeGVCFs container requires 10GB of memory (as specified in runtime of task GenotypeGVCFs)?
  3. Some of the “File” defined in tasks are not explicitly referred inside the task, are they implicitly used by the application called in the task? For istance, “File ref_dict” in task UnzipGVCF is not explicitly used in “command”, but it probably is used by the application GATK4.jar, which implicitly obtain the file name of ref_dict from the fasta file and assumes ref_dict is located in the same directory with the fasta file?
  4. Geraldine mentioned that the workflow run to completion on a wholte-genome sample. Any information on how long did it take to complete, and how much is the input data size?

Thanks!

Kitty

Best Answer

Answers

  • KittyflyKittyfly Member, Broadie

    Thanks. That answers my questions.

Sign In or Register to comment.