【GATK4 SPARK】When the sparktool can be released in GATK4?

GraceZouGraceZou ChinaMember

Now we can use sparktools by beta version in GATK4, but it is not for commercial since the result is not same with non-spark. So When can we used it for commercial? Do you have any plan?


  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @GraceZou,

    It is up to researchers to vet any of our BETA tools for production use. The BETA status just means we have not finished all of our internal validations. Sorry, I'm not aware of any future production plans. You can perhaps periodically check the following repository for new and sanctioned workflows.

    Production pipelines will have the label broad-prod. Each workflow step will have a version of a tool via the Docker version tag in the WDL pipeline, under runtime, e.g.:

    task CollectQualityYieldMetrics {
      File input_bam
      String metrics_filename
      Float disk_size
      Int preemptible_tries
      command {
        java -Xms2000m -jar /usr/gitc/picard.jar \
          CollectQualityYieldMetrics \
          INPUT=${input_bam} \
          OQ=true \
      runtime {
        disks: "local-disk " + sub(disk_size, "\\..*", "") + " HDD"
        memory: "3 GB"
        preemptible: preemptible_tries
        docker: "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786"
      output {
        File metrics = "${metrics_filename}"

    shows the particular step uses a Docker container "broadinstitute/genomes-in-the-cloud:2.3.1-1512499786. This is just a virtual environment setup with particular versions of tools installed. A docker inspect for the docker image should list the version of the particular Picard tool. If you are unfamiliar with WDL/Docker etc, you can gain a sense of how it works from this article.

Sign In or Register to comment.