Hi GATK Users,

Happy Thanksgiving!
Our staff will be observing the holiday and will be unavailable from 22nd to 25th November. This will cause a delay in reaching out to you and answering your questions immediately. Rest assured we will get back to it on Monday November 26th. We are grateful for your support and patience.
Have a great holiday everyone!!!

Regards
GATK Staff

GenotypeGVCF Parallelism

lsturmlsturm Member
edited August 2016 in Ask the GATK team

Hello,

I am trying to do joint genotyping with GenotypeGVCF on about 250 exomes. I tried to look at the docs to see the best way to paralyze this process, but didn't find a clear answer. Are nt and nct supported for GenotypeGVCF? Are there recommendations for these parameters with this tool?

Thank you very much!
Luke

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @lsturm,

    One approach is to run parallel processes that scatter over genomic intervals. That is, you can restrict each of your processes to a different genomic intervals list then gather the outputs. Our new WDL scripts allow for this. This document gives an example of this type of scattering for a HaplotypeCaller step. For an intro to Cromwell/WDL, see this blogpost. The requirements are straight-forward and described here.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @lsturm,

    Just to clarify, I just checked the GenotypeGVCFs documentation and it says that -nt is an option for parallelization. This is an article that describes the different options in general terms. I hope this is helpful.

  • lsturmlsturm Member

    Thank you very much! Do you think using a scatter/gather approach through a custom pipeline with WDL/Cromwell would be more efficient than just running the call with -nt?

  • KateNKateN Cambridge, MAMember, Broadie, Moderator admin

    Using both in conjunction would be the most efficient method. The multithreading done with the -nt option allows you to better use the power of whatever machine you are running on, whether that be in the cloud or on your own local machine.

    The scatter/gather implementation using a pipelining solution like WDL allows you to break up the problem in a way that allows it to run even faster. If I had to choose one over the other, I would recommend this scatter/gather over multithreading, but when you use both, you can see significant runtime improvements.

  • lsturmlsturm Member

    Thank you very much Kate!

Sign In or Register to comment.