I tried using -nct as documented here but when I run it with the -nct option I have this error: The analysis HaplotypeCaller currently does not support parallel execution with nct. Please run your analysis without the nct option.
Thank you in advance.
Answers
@fborja
Hi,
Which version of GATK are you using? It seems like you may be using an older version that does not not support parallelism.
Please upgrade to the latest version (3.2-2) which does support the use of -nct.
-Sheila
Hello, indeed this is the case. Thank you for the quick response. I have a follow up for this, I have read that for -nt option, 24 cores is sort of the maximum number of cores used that would still render extremely effective result. What is for -nct? Thanks!
@fborja
Hello,
I am glad that solved it!
As for an exact value for -nct, we do not provide exact values because it depends on people's hardware and setup. We recommend trying out a few different numbers and seeing what works best for you.
Good luck.
-Sheila
Ok, I will play with this parameter. though, do you have any recommendation for a Centos 6.2 64bit, 48 core, 512 RAM, AMD opteron HPC?
@fborja
Hi,
No, we do not have any specific recommendations for you.
-Sheila
Ok. Thank you!
I have been using HaplutypeCaller with -nct 4 and --nct 6 options. And monitoring the CPU usage shows that HaplotypeCaller has been utilizing the provided cores. However, the tool completes the process in the usual time which it takes when -nct 1 is provided. In essence it it not completing the process faster as expected. Could someone help what could be done to speedup the process. Thanks.
@mehar
Hi,
Users have reported nct and nt to be unpredictable. The best thing to do is try Queue. http://gatkforums.broadinstitute.org/discussion/1306/overview-of-queue
-Sheila
Dear Sheila,
Thank you. I have checked the Queue pages, i understand that Queue is used to organize/automate tasks in the pipeline. However, here i have been using only HaplotypeCaller command which might not need Queue as there is only one task. Our idea here is to speed up the Haplotype calling process rather than queuing multiple tasks. Please correct me if i understood wrong.
@mehar you can't speed up the operation of the program itself, but with queue scatter-gather you can split up the overall task into many small tasks that can be run in parallel, so overall you will be done sooner.
Thanks geraldine!! I browsed through the forum to find an example command line or documentation to execute the HaplotypeCaller using scatter-gather. Could you direct me to an example.
@mehar
Hi,
This page contains many helpful documents: https://www.broadinstitute.org/gatk/guide/topic?name=queue
https://www.broadinstitute.org/gatk/guide/article?id=1311
https://www.broadinstitute.org/gatk/guide/article?id=1312
Also, you can watch the videos on queue here:
https://www.broadinstitute.org/gatk/guide/presentations?id=3391 (at the bottom of the page)
-Sheila