We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
This section of the forum is now closed; we are working on a new support model for WDL that we will share here shortly. For Cromwell-specific issues, see the Cromwell docs and post questions on Github.
Scatter Gather and Spark together
I can't find any recommendations on how to use scatter gather and spark together.
We do panel diagnosics and whole exomes. Therefore a run may contain up to 60 samples. My first idea was to use Scatter Gather to analyse a few samples at the same time. Our server crashed with a concurrent-job-limit > 3 because we ran out of ram and all cores were at 100%. Since I plan to use spark in the future, I wanted to know
- if it is a good idea to use scatter-gather and spark together,
- how they work with each other
- and if there are recommendations how I can calulate the number of cromwell jobs and spark workers from my hardware.
Please just show me the way if there is already a tutorial that answers my question.
Thanks and best regards,