How to parallelise HaplotypeCaller in 4.0.0?

KrzysztofMKozakKrzysztofMKozak Cambridge, UKMember

Hello,

At this time, what is the recommended way to parallelise HaplotypeCaller in GATK4, please? Assuming I care about results and don't want to use the Spark version. In particular, what is the effect of nativePairHmmThreads? So far it has had no influence on the speed of my runs, and yet runs with the same parameters can vary drastically in length (with the record ones taking over a month on a vey diverse insect...).

Best Answer

  • SheilaSheila Broad Institute admin
    Accepted Answer

    @KrzysztofMKozak
    Hi,

    Spark tools are the only way to go about parallelization in GATK4. From the developer:

    "In GATK4, the way to make a tool multithreaded is to implement it as a Spark tool. All Spark tools can be trivially parallelized across multiple threads using the local runner, and across a cluster using spark-submit or gcloud. We wanted to avoid the complexities of implementing our own map/reduce framework, as was done in previous versions of the GATK, and instead rely on a standard, third-party framework to keep the GATK4 engine as simple as possible.

    The number of threads used by NativePairHMM is related to an Intel upgrade which you can read more about here.

    I hope that helps.

    -Sheila

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
    Accepted Answer

    @KrzysztofMKozak
    Hi,

    Spark tools are the only way to go about parallelization in GATK4. From the developer:

    "In GATK4, the way to make a tool multithreaded is to implement it as a Spark tool. All Spark tools can be trivially parallelized across multiple threads using the local runner, and across a cluster using spark-submit or gcloud. We wanted to avoid the complexities of implementing our own map/reduce framework, as was done in previous versions of the GATK, and instead rely on a standard, third-party framework to keep the GATK4 engine as simple as possible.

    The number of threads used by NativePairHMM is related to an Intel upgrade which you can read more about here.

    I hope that helps.

    -Sheila

  • afzmafzm Member
    Does that mean that without Sparks the tool will not run using more than 1 core?
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @afzm

    That is correct.

Sign In or Register to comment.