parallelizing HC on PBS with Queue

marcocmarcoc milanMember

I'm attempting to use Queue on PBSPro HPC cluster. I have tested the functionality of a custom scala script for Haplotype Caller and it is runnable. However, following the discussion on GATK forum, I should need a job scheduler to dispatch queue output on several nodes..could you give me some advice or examples of the type of scheduler I need in a PBSpro system?
I tried to run Queue on a single node and it seems working faster..the question is: when I run Queue on a single node I actually 'multithreading' HC or I'm wrong?
Thanks a lot
Best Regards

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @marcoc,

    There are several ways to parallelize processing. Multi-threading is a name commonly (but not exclusively) given to the form of parallelism where you run multiple threads (parts of the program's operation) on the same machine. When you have a cluster available, you can add another layer of parallelism that we call scatter-gather. That involves dividing the task you want to run into multiple parts (which is what Queue does for you) and using a job scheduler to dispatch each part to a different node, or machine, in the cluster.

    Now, when you say that you are running on a PBS Pro HPC cluster you actually answered your own question -- PBS is the name of the scheduler that runs on your system :)

    Depending on the system you might interact differently with the scheduler. In the simplest configuration, you just need to specify the name of the job queue that you use, and Queue will send the jobs it defines to there, so your system can do the rest. I have not used PBS itself (we use LSF) so If you're not sure how you should send jobs to your scheduler, I would recommend asking your IT support people or systems administrator for help.

  • marcocmarcoc milanMember

    Many thanks for the quick reply..excuse me for the trivial answers :blush: I'm now becoming familiar with Queue and GATK tools..I will try with the cluster managers to find a solution for parallelizing GATK in PBS.
    Thanks again
    Best Regards
    Marco Cirilli

