GATK not found when running WDL script locally

YatrosYatros Seattle, WA, USAMember

Hello,

I have a similar problem as the one reported in this old post:
https://gatkforums.broadinstitute.org/wdl/discussion/10047/cromwell-wdl-with-gatk-launch

I have no issues when I run a single gatk command in the console. However, when the gatk command is within a WDL script, the gatk program won't be located by WDL+CROMWELL, no matter what I do.

I'm using GATK-4.0.1.2 to run a custom version of the joint-discovery-gatk4.wdl pipeline (https://github.com/gatk-workflows/gatk4-germline-snps-indels/blob/master/joint-discovery-gatk4.wdl)

My JSON file has the following input for the gatk_path:
"JointGenotyping.gatk_path": "/mnt/user/opt/gatk-4.0.1.2/gatk"

The first time my WDL script calls GATK is in the ImportGVCFs task, that looks like this:

 call ImportGVCFs {
 input:
 sample_name_map = sample_name_map,
 interval = unpadded_intervals[idx],
 workspace_dir_name = "genomicsdb",
 disk_size = medium_disk,
 docker_image = gatk_docker,
 gatk_path = gatk_path,
 batch_size = 50
  }


  task ImportGVCFs {
  File sample_name_map
  String interval

  String workspace_dir_name

  String java_opt
  String gatk_path  

  String docker_image
  Int disk_size
  String mem_size
  Int preemptibles
  Int batch_size

  command <<<
    set -e

    rm -rf ${workspace_dir_name}

    # The memory setting here is very important and must be several GB lower
    # than the total memory allocated to the VM because this tool uses
    # a significant amount of non-heap memory for native libraries.
    # Also, testing has shown that the multithreaded reader initialization
    # does not scale well beyond 5 threads, so don't increase beyond that.
    ${gatk_path} --javaOptions "${java_opt}" \
    GenomicsDBImport \
    --genomicsDBWorkspace ${workspace_dir_name} \
    --batchSize ${batch_size} \
    -L ${interval} \
    --sampleNameMap ${sample_name_map} \
    --readerThreads 5 \
    -ip 500

    tar -cf ${workspace_dir_name}.tar ${workspace_dir_name}

  >>>
  runtime {
    docker: docker_image
    memory: mem_size
    cpu: "2"
    disks: "local-disk " + disk_size + " HDD"
    preemptible: preemptibles
  }
  output {
    File output_genomicsdb = "${workspace_dir_name}.tar"
  }
}

I already exported the local jar by adding the following line to my .bash_profile file:
GATK_LOCAL_JAR=/mnt/user/opt/gatk-4.0.1.2/gatk-package-4.0.1.2-local.jar

It does not matter what I try. Every time I run the WDL script and it reaches the ImportGVCFs task, I always get the error:

/cromwell-executions/JointGenotyping/b26479f8-22b6-4b89-a755-3b7b5231ecf2/call-ImportGVCFs/shard-0/execution/script: line 16: /mnt/user/opt/bin/bin/gatk: No such file or directory

I also tried to add an additional entry in the JSON file pointing to the actual jar file as suggested in the old post, but it didn't work either.

Can anybody suggest any solutions so that my WDL+CROMWELL script can locate the gatk file when running it locally?

Thank you very much,

Best,

Yatros

Best Answers

Answers

Sign In or Register to comment.