about distribution and parallelization

Bogdan

Dear all, good afternoon.

As I am very new to paralellization of GATK pipeline and just started reading about Queue, thought that I could ask you the following :

if we have "a list of intervals" (-L option) that includes all the human chromosomes, could we tell GATK or Mutect to run 1-2 chromosomes on each microprocessor of a cluster node in a parallel manner ? Will any additional software be required in order to do so ? many thanks,

-- bogdan


  Geraldine_VdAuwera

    Hi Bogdan,

    Yes, that's one of the built-in features of Queue. We call it scatter-gather; by default each GATK tool "knows" what is the most appropriate way to split up jobs, so all you need to do is set the scatter count property (sc) of the tool to specify how many ways the job should be split. You can find more details here in the presentation slide deck and video called "Parallelism with Queue". By default Queue jobs are meant to be run in parallel on a cluster via a job scheduler, but it's also possible to run them in parallel locally on multiprocessor machines, thanks to a contribution from an external developer.

