We've moved!
For WDL questions, see the WDL specification and WDL docs.
For Cromwell questions, see the Cromwell docs and please post any issues on Github.

GATK best practices pipelinee written in scala for Queue


I tried to GATK haplotypeCaller, but it is too slow on my WGS data. I have been reading the GATK forums and am trying out Queue to make the pipeline faster. I have gotten as far as the HelloWorld.scala and CountingReads.scala, and reading about traits. I have used Java in the past, so I have an understanding of the basics here.

I see several threads about Queue, but am not sure which is the latest and most comprehensive. So decided to post the question here.

1) I would like to implement the following in Queue, and would like to know if there are any scala scripts already present for this workflow

  • Sort Bam
  • MArk duplicates
  • Add or Create Groups
  • Reorder Bam
  • Index Bam
  • GATK haplotypeCaller
  • GATK VariantFiltration

2) Till now I have been using a shell script that loops my workflow (that has the above steps) on every BAM file that I have. When using Queue, How should I implement the scatter-gather here ? I did not see any documentation for that. How can i get the Qscript to run of my BAM files in parallel ?
PS: I do not have a cloud or a cluster to work on. I am trying this out on my local Ubuntu machine.

Best Answers


  • Thanks. I'm learning how the tutorials on WDL and have a question. Why are some parameters given in quotes ?

    eg: Tutorial 2 - Write a simple muti-step workflow: https://software.broadinstitute.org/wdl/userguide/topic?name=wdl-tutorials

    Why is it necessary to give type="SNP" and type = "INDEL" in quotes ?

    The GATK SelectVariants tool does not require type to be in quotes

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭


    I just moved your discussion to the "Ask the WDL team" section. Kate @KateN will help you here.


Sign In or Register to comment.