I have access to a cluster with 8 nodes, each node with 64 Gb RAM and 8 cores. I'm trying to process 30 samples using UnifiedGenotyper. Each sample consists of an exome of around 62e6 bases and has an average coverage around 60x. Can you give me any advice on nt/nct configuration in order to optimize the performance of the execution, using and not using Queue?
We don't have the resources right now to give case-by-case advice on configurations, so you'll need to experiment with your setup based on the general guidelines in the article. You may want to discuss it with the people who manage your cluster, as they may also have some helpful insights. Good luck!
On the DepthOfCoverage page http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_coverage_DepthOfCoverage.html it says DepthOfCoverage supports -nt, but when run the walker says it does not support parallel execution. Is there a quick way to find out (before running) what tools support each option?
Also, some of the nodes I use have hyperthreading. Should I double the -nt (or -nct), or just use the physical number of cores?
The tech doc is the best way to find out whether a tool supports parallelism or not. If it's listed as TreeReducible, DoC should support -nt. Can you please post the command line you tried and the error message that you got?
We cannot run DepthOfCoverage with -nt either, have tried both GATK 2.3-4 and 2.5-2.
The command line is:
java -Xmx2g -jar ./GenomeAnalysisTK.jar -R /repo/ref/ref.fasta -T DepthOfCoverage -o coverage_out_nt4 -I ./data/Sample1.bam -nt 2
And got error message:
We've recently tracked down the reason for this. Please see this thread:
I am working on the parallelization of a Whole-exome pipeline. I split a bam file by chromosome and perform recalibration and realignment separately on 24 bam files. However, I have noticed that I am getting slightly different results in terms of variant calls when compare to the bam file produced by a pipeline without parallelization. I have noticed that the difference appears at the realignment step were the regions look differently aligned. I am wondering whether this is due to the fact that RealignerTargetCreator and IndelRealigner are not supposed properly with the scatter-gather technique (we are not using Queue to perform scatter-gather)
Can you please post a couple of screenshots showing what are the differences you see?
Hello there! Was wondering if you could update this wonderful post to include recommendations for the HaplotypeCaller?
does the above recommendation of 4 cluster nodes assume that there are exactly 4 nodes available and therefore, if more nodes were available, they could be also used? Or does it mean that even if more than 4 nodes are available, using more that 4 nodes would actually slow down the process? (due to I/O issues maybe)
Sorry to get back to you so late, your comment seems to have slipped through my net. We're working on a set of new docs for HC, so I'll include an update for this doc as well. Can't promise an ETA though, might be a week or two before we get to it.
Your first hypothesis is correct -- the example assumes 4 nodes. If you have more, feel free to use more. In our hands it takes a lot more before we see any I/O issues. But that can depend on your platform, so you may want to experiment a little before launching any important jobs.
I have an i7 6 core CPU that with hyper-threading gives me 12 cores of CPU and I have 64 GB of DDR3 RAM. I want to run UG as fast as possible for this system what would be the max -nt and -nct for this system. Would this command make sense:
java -jar GenomeAnalysisTK.jar \
-R resources/Homo_sapiens_assembly18.fasta \
-T UnifiedGenotyper \
-I sample1.bam [-I sample2.bam ...] \
--dbsnp dbSNP.vcf \
-o snps.raw.vcf \
We can't provide specific recommendations for determining what multithreading values make sense for individual systems; you'll need to experiment and figure it out on your own, sorry.
Thank you Geraldian for the response, I totally understand why you don't provide any recommendations. As I know if people use close to 100% of their CPU (CPU usage) they might face some issues. (i.e, high CPU temp., crashes in the middle of their jobs, slowing of their system if they need to run something else and so on...)
I figured out a better way to adjust my variant calling in an efficient way for my system and for ME! For those of you who are interested in adjusting your -nt and -nct according to your system in a faster way rather than trying to run different commands with different -nt values and comparing the time! I have a recommendation.
If you install a tool like sensors (Command line)/psensor (GUI) you can monitor your CPU temp. and CUP usage in real time. I have 12 cores of CPU (hyper-threaded i7 6 core 3.9 GHz, AND 64 GB RAM) and if I run UG with -nt 6 I get around 50% CPU usage and 55 Celsius degrees tempt at each of my 6 physical cores. (I have a good cooling system so if my CPU usage is about 100% I still don't have over heating problem)
However, I personally don't like running a single command that uses about 85% of my CPU, so -nt 6 is perfect for me as it uses 50% and I don't have any over heat! (By the way, for those who have speed fever, I don't recommend over clocking your CPU for a bioinformatic job that might run more than 5 hours)
Good luck with adjusting your -nt value
Thanks for reporting this, @alirezakj -- I'm sure it will be useful for others.
Thank YOU Geraldine for all of your efforts!
By the way, I forgot to report that UG without -nt value (default) is using 9% of my CPU and at 37 C temp! So -nt certainly helps a lot thanks to GATK developing team!
Those who are very interested in speed, please have in mind that if you have a slow HDD then you have a bottleneck of speed right there and -nt might not be so helpful after a certain value!
I have a 20 TB RAID 0 (5 x 4 TB western digital black, each with 7200 RPM speed, SATA 6.0 Gb/s, 64Mb cache) HDD space that make me very fast in RAID mode!
Howdy GATK folk!
@Geraldine_VdAuwera - Thought you and the crowd here might benefit, we've found that PrintReads seems to stop scaling after about -nt 8 and doesn't seem to need much ram for those 8, a java heap of around 8 (-Xmx8g) seems to be enough. Might just be our fun SSD heavy HPC system though (speaking of disk speed).
Thanks for reporting that, @Tristan! Sounds about right for PrintReads -- not a lot of processing going on there, the burden is mostly all I/O, unless you're recalibrating bases, which does take a little more for the calculations (but nothing crazy).
Hi Geraldine, @Geraldine_VdAuwera , I was wondering if I split the bam file for RealignerTargetcreator will it affect the downstream analysis?. Would you recommend doing this in order to speed up the realignerTargetcreator?.
@rsArgMar12 , Did you see any differences in your analysis?.
@hns04 You can safely split by chromosome for RTC, but any smaller than that you risk edge effects. If you have a split in the middle of a region that should be realigned it won't be identified correctly.
I'm running GATK with UnifiedGenotyper on several RNAseq files. Most of the time I'm able to specify -nt 16 and it will run on all 16 cores. On occasion, it refuses to use more than 1 core, even when I specify 16. The cluster resources are clearly available (no other jobs presently running). Does GATK or UnifiedGenotyper do any kind of memory or resource evaluation before initializing multiple threads? Is there a way to force it to only run when it has the resources to use all appointed threads?
To be more specific, when the multithreading works, I see this:
INFO 12:09:36,280 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
INFO 12:09:36,292 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
for each thread it initiates.
When it doesn't work, I only see these two lines of code once.
That's the only difference in the stdout from GATK.
I was able to figure out what the problem was. The bam file wasn't indexed (no .bai file present).
Glad to hear you solved your problem. That said, please note that we don't recommend using UG on RNAseq data; you should use HaplotypeCaller instead. Be sure to follow the best practices described in the documentation for best results.