GATK best practices pipelinee written in scala for Queue

Hello,

I tried to GATK haplotypeCaller, but it is too slow on my WGS data. I have been reading the GATK forums and am trying out Queue to make the pipeline faster. I have gotten as far as the HelloWorld.scala and CountingReads.scala, and reading about traits. I have used Java in the past, so I have an understanding of the basics here.

I see several threads about Queue, but am not sure which is the latest and most comprehensive. So decided to post the question here.

1) I would like to implement the following in Queue, and would like to know if there are any scala scripts already present for this workflow

  • Sort Bam
  • MArk duplicates
  • Add or Create Groups
  • Reorder Bam
  • Index Bam
  • GATK haplotypeCaller
  • GATK VariantFiltration

2) Till now I have been using a shell script that loops my workflow (that has the above steps) on every BAM file that I have. When using Queue, How should I implement the scatter-gather here ? I did not see any documentation for that. How can i get the Qscript to run of my BAM files in parallel ?
PS: I do not have a cloud or a cluster to work on. I am trying this out on my local Ubuntu machine.

Best Answers

Answers

  • Thanks. I'm learning how the tutorials on WDL and have a question. Why are some parameters given in quotes ?

    eg: Tutorial 2 - Write a simple muti-step workflow: https://software.broadinstitute.org/wdl/userguide/topic?name=wdl-tutorials

    Why is it necessary to give type="SNP" and type = "INDEL" in quotes ?

    The GATK SelectVariants tool does not require type to be in quotes
    https://software.broadinstitute.org/gatk/documentation/tooldocs/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @kritikool
    Hi,

    I just moved your discussion to the "Ask the WDL team" section. Kate @KateN will help you here.

    -Sheila

Sign In or Register to comment.