This section of the forum is no longer actively monitored. We are working on a support migration plan that we will share here shortly. Apologies for this inconvenience.
GATK best practices pipelinee written in scala for Queue
I tried to GATK haplotypeCaller, but it is too slow on my WGS data. I have been reading the GATK forums and am trying out Queue to make the pipeline faster. I have gotten as far as the HelloWorld.scala and CountingReads.scala, and reading about traits. I have used Java in the past, so I have an understanding of the basics here.
I see several threads about Queue, but am not sure which is the latest and most comprehensive. So decided to post the question here.
1) I would like to implement the following in Queue, and would like to know if there are any scala scripts already present for this workflow
- Sort Bam
- MArk duplicates
- Add or Create Groups
- Reorder Bam
- Index Bam
- GATK haplotypeCaller
- GATK VariantFiltration
2) Till now I have been using a shell script that loops my workflow (that has the above steps) on every BAM file that I have. When using Queue, How should I implement the scatter-gather here ? I did not see any documentation for that. How can i get the Qscript to run of my BAM files in parallel ?
PS: I do not have a cloud or a cluster to work on. I am trying this out on my local Ubuntu machine.