To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

GenotypeGVCF Parallelism

lsturmlsturm Member
edited August 2016 in Ask the GATK team

Hello,

I am trying to do joint genotyping with GenotypeGVCF on about 250 exomes. I tried to look at the docs to see the best way to paralyze this process, but didn't find a clear answer. Are nt and nct supported for GenotypeGVCF? Are there recommendations for these parameters with this tool?

Thank you very much!
Luke

Answers

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @lsturm,

    One approach is to run parallel processes that scatter over genomic intervals. That is, you can restrict each of your processes to a different genomic intervals list then gather the outputs. Our new WDL scripts allow for this. This document gives an example of this type of scattering for a HaplotypeCaller step. For an intro to Cromwell/WDL, see this blogpost. The requirements are straight-forward and described here.

  • shleeshlee CambridgeMember, Broadie, Moderator

    Hi @lsturm,

    Just to clarify, I just checked the GenotypeGVCFs documentation and it says that -nt is an option for parallelization. This is an article that describes the different options in general terms. I hope this is helpful.

  • Thank you very much! Do you think using a scatter/gather approach through a custom pipeline with WDL/Cromwell would be more efficient than just running the call with -nt?

  • KateNKateN Cambridge, MAMember, Broadie, Moderator

    Using both in conjunction would be the most efficient method. The multithreading done with the -nt option allows you to better use the power of whatever machine you are running on, whether that be in the cloud or on your own local machine.

    The scatter/gather implementation using a pipelining solution like WDL allows you to break up the problem in a way that allows it to run even faster. If I had to choose one over the other, I would recommend this scatter/gather over multithreading, but when you use both, you can see significant runtime improvements.

  • Thank you very much Kate!

Sign In or Register to comment.