The Frontline Support team will be offline December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks as we get to all of your questions. Happy Holidays!
GATK Queue and the Data Processing Pipeline
What is the best way to get Queue to optimize utilization of a given number of cores in an SGE cluster? The DataProcessingPipeline.scala has a hidden parameter "scatter_gather" which sets the nContigs variable. Is it safe to use this option? For example, if you had 100 cores available in your cluster could you set the option to 100? Is there any advantage to setting it higher?
Without setting it, Queue appears to set the nContigs value based on the number of chromosomes in the BAM input. So if using a whole genome BAM it's 25, your example Chr20 data it's 1, or with an unaligned BAM it's 0. So if starting with unaligned data, it appears to run on a single core?