GenotypeGVCF Parallelism

lsturmlsturm Member
edited August 2016 in Ask the GATK team


I am trying to do joint genotyping with GenotypeGVCF on about 250 exomes. I tried to look at the docs to see the best way to paralyze this process, but didn't find a clear answer. Are nt and nct supported for GenotypeGVCF? Are there recommendations for these parameters with this tool?

Thank you very much!


  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @lsturm,

    One approach is to run parallel processes that scatter over genomic intervals. That is, you can restrict each of your processes to a different genomic intervals list then gather the outputs. Our new WDL scripts allow for this. This document gives an example of this type of scattering for a HaplotypeCaller step. For an intro to Cromwell/WDL, see this blogpost. The requirements are straight-forward and described here.

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @lsturm,

    Just to clarify, I just checked the GenotypeGVCFs documentation and it says that -nt is an option for parallelization. This is an article that describes the different options in general terms. I hope this is helpful.

  • Thank you very much! Do you think using a scatter/gather approach through a custom pipeline with WDL/Cromwell would be more efficient than just running the call with -nt?

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    Using both in conjunction would be the most efficient method. The multithreading done with the -nt option allows you to better use the power of whatever machine you are running on, whether that be in the cloud or on your own local machine.

    The scatter/gather implementation using a pipelining solution like WDL allows you to break up the problem in a way that allows it to run even faster. If I had to choose one over the other, I would recommend this scatter/gather over multithreading, but when you use both, you can see significant runtime improvements.

  • Thank you very much Kate!

Sign In or Register to comment.