If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Questions about multithreading parallelism

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
This discussion was created from comments split from: How can I use parallelism to make GATK tools run faster?.


  • xpastorxpastor Member
    edited February 2013


    I have access to a cluster with 8 nodes, each node with 64 Gb RAM and 8 cores. I'm trying to process 30 samples using UnifiedGenotyper. Each sample consists of an exome of around 62e6 bases and has an average coverage around 60x. Can you give me any advice on nt/nct configuration in order to optimize the performance of the execution, using and not using Queue?


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Xavier,

    We don't have the resources right now to give case-by-case advice on configurations, so you'll need to experiment with your setup based on the general guidelines in the article. You may want to discuss it with the people who manage your cluster, as they may also have some helpful insights. Good luck!

  • trgalltrgall Member

    On the DepthOfCoverage page it says DepthOfCoverage supports -nt, but when run the walker says it does not support parallel execution. Is there a quick way to find out (before running) what tools support each option?

    Also, some of the nodes I use have hyperthreading. Should I double the -nt (or -nct), or just use the physical number of cores?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    The tech doc is the best way to find out whether a tool supports parallelism or not. If it's listed as TreeReducible, DoC should support -nt. Can you please post the command line you tried and the error message that you got?

  • ecyehecyeh Member

    We cannot run DepthOfCoverage with -nt either, have tried both GATK 2.3-4 and 2.5-2.
    The command line is:
    java -Xmx2g -jar ./GenomeAnalysisTK.jar -R /repo/ref/ref.fasta -T DepthOfCoverage -o coverage_out_nt4 -I ./data/Sample1.bam -nt 2
    And got error message:

    ERROR MESSAGE: Invalid command line: Argument nt has a bad value: The analysis DepthOfCoverage aggregates results by interval. Due to a current limitation of the GATK, analyses of this type do not currently support parallel execution. Please run your analysis without the -nt option.
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
  • rsArgMar12rsArgMar12 Member


    I am working on the parallelization of a Whole-exome pipeline. I split a bam file by chromosome and perform recalibration and realignment separately on 24 bam files. However, I have noticed that I am getting slightly different results in terms of variant calls when compare to the bam file produced by a pipeline without parallelization. I have noticed that the difference appears at the realignment step were the regions look differently aligned. I am wondering whether this is due to the fact that RealignerTargetCreator and IndelRealigner are not supposed properly with the scatter-gather technique (we are not using Queue to perform scatter-gather)
    Thank you.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi rsArgMar12,

    Can you please post a couple of screenshots showing what are the differences you see?

  • TristanTristan La Jolla, CAMember ✭✭

    Hello there! Was wondering if you could update this wonderful post to include recommendations for the HaplotypeCaller?

  • armenarmen Member


    does the above recommendation of 4 cluster nodes assume that there are exactly 4 nodes available and therefore, if more nodes were available, they could be also used? Or does it mean that even if more than 4 nodes are available, using more that 4 nodes would actually slow down the process? (due to I/O issues maybe)

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @Tristan,

    Sorry to get back to you so late, your comment seems to have slipped through my net. We're working on a set of new docs for HC, so I'll include an update for this doc as well. Can't promise an ETA though, might be a week or two before we get to it.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @armen,

    Your first hypothesis is correct -- the example assumes 4 nodes. If you have more, feel free to use more. In our hands it takes a lot more before we see any I/O issues. But that can depend on your platform, so you may want to experiment a little before launching any important jobs.

  • alirezakjalirezakj Member
    edited October 2013

    I have an i7 6 core CPU that with hyper-threading gives me 12 cores of CPU and I have 64 GB of DDR3 RAM. I want to run UG as fast as possible for this system what would be the max -nt and -nct for this system. Would this command make sense:
    java -jar GenomeAnalysisTK.jar \
    -R resources/Homo_sapiens_assembly18.fasta \
    -T UnifiedGenotyper \
    -I sample1.bam [-I sample2.bam ...] \
    --dbsnp dbSNP.vcf \
    -o snps.raw.vcf \
    -nt 12\
    -nct 6\


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @alirezakj,

    We can't provide specific recommendations for determining what multithreading values make sense for individual systems; you'll need to experiment and figure it out on your own, sorry.

  • alirezakjalirezakj Member

    Thank you Geraldian for the response, I totally understand why you don't provide any recommendations. As I know if people use close to 100% of their CPU (CPU usage) they might face some issues. (i.e, high CPU temp., crashes in the middle of their jobs, slowing of their system if they need to run something else and so on...)

    I figured out a better way to adjust my variant calling in an efficient way for my system and for ME! For those of you who are interested in adjusting your -nt and -nct according to your system in a faster way rather than trying to run different commands with different -nt values and comparing the time! I have a recommendation.

    If you install a tool like sensors (Command line)/psensor (GUI) you can monitor your CPU temp. and CUP usage in real time. I have 12 cores of CPU (hyper-threaded i7 6 core 3.9 GHz, AND 64 GB RAM) and if I run UG with -nt 6 I get around 50% CPU usage and 55 Celsius degrees tempt at each of my 6 physical cores. (I have a good cooling system so if my CPU usage is about 100% I still don't have over heating problem)
    However, I personally don't like running a single command that uses about 85% of my CPU, so -nt 6 is perfect for me as it uses 50% and I don't have any over heat! (By the way, for those who have speed fever, I don't recommend over clocking your CPU for a bioinformatic job that might run more than 5 hours)

    Good luck with adjusting your -nt value

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks for reporting this, @alirezakj -- I'm sure it will be useful for others.

  • alirezakjalirezakj Member

    Thank YOU Geraldine for all of your efforts!
    By the way, I forgot to report that UG without -nt value (default) is using 9% of my CPU and at 37 C temp! So -nt certainly helps a lot thanks to GATK developing team!
    Those who are very interested in speed, please have in mind that if you have a slow HDD then you have a bottleneck of speed right there and -nt might not be so helpful after a certain value!
    I have a 20 TB RAID 0 (5 x 4 TB western digital black, each with 7200 RPM speed, SATA 6.0 Gb/s, 64Mb cache) HDD space that make me very fast in RAID mode!

  • TristanTristan La Jolla, CAMember ✭✭

    Howdy GATK folk!
    @Geraldine_VdAuwera - Thought you and the crowd here might benefit, we've found that PrintReads seems to stop scaling after about -nt 8 and doesn't seem to need much ram for those 8, a java heap of around 8 (-Xmx8g) seems to be enough. Might just be our fun SSD heavy HPC system though (speaking of disk speed).

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Thanks for reporting that, @Tristan! Sounds about right for PrintReads -- not a lot of processing going on there, the burden is mostly all I/O, unless you're recalibrating bases, which does take a little more for the calculations (but nothing crazy).

  • hns04hns04 Member

    Hi Geraldine, @Geraldine_VdAuwera‌ , I was wondering if I split the bam file for RealignerTargetcreator will it affect the downstream analysis?. Would you recommend doing this in order to speed up the realignerTargetcreator?.
    @rsArgMar12‌ , Did you see any differences in your analysis?.


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @hns04‌ You can safely split by chromosome for RTC, but any smaller than that you risk edge effects. If you have a split in the middle of a region that should be realigned it won't be identified correctly.

  • coryfunkcoryfunk Institute for Systems BiologyMember

    I'm running GATK with UnifiedGenotyper on several RNAseq files. Most of the time I'm able to specify -nt 16 and it will run on all 16 cores. On occasion, it refuses to use more than 1 core, even when I specify 16. The cluster resources are clearly available (no other jobs presently running). Does GATK or UnifiedGenotyper do any kind of memory or resource evaluation before initializing multiple threads? Is there a way to force it to only run when it has the resources to use all appointed threads?

  • coryfunkcoryfunk Institute for Systems BiologyMember

    To be more specific, when the multithreading works, I see this:

    INFO 12:09:36,280 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 12:09:36,292 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01

    for each thread it initiates.

    When it doesn't work, I only see these two lines of code once.

    That's the only difference in the stdout from GATK.

  • coryfunkcoryfunk Institute for Systems BiologyMember

    I was able to figure out what the problem was. The bam file wasn't indexed (no .bai file present).

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @coryfunk ,

    Glad to hear you solved your problem. That said, please note that we don't recommend using UG on RNAseq data; you should use HaplotypeCaller instead. Be sure to follow the best practices described in the documentation for best results.

Sign In or Register to comment.