Executing multiple tasks in a single SGE job


We would like to optimize our workflows to make use of local SSD on our HPC compute nodes to minimize iow. Is it possible to run subworkflows in "local" backend mode, but then all other tasks in the main workflow as individual SGE jobs? For example, we might want to run Fastq through HaplotypeCaller for an individual sample on the same compute node as a sub workflow to make use of the local SSD for intermediate files and then copy final result to shared storage. These sub workflows would be scattered among samples in our batch. Once the scatter (all samples) is completed, then a new subworkflow is called, again to a single compute node "local" backend mode which would run a joint genotype through filter/annotation steps with a copy of final results to shared storage (but any intermediates would be to the local SSD). We are considering a single SGE job per sample dispatched to take the sequencing data through data preprocessing and variant discovery, then a new SGE job to do joint genotyping (for all samples in the batch) and annotation, etc.

Would something like this be possible/recommended?

Thanks much for your time and thoughts.


Sign In or Register to comment.