Undocumented use of CPU resources

marchoeppnermarchoeppner Kiel, GermanyMember

Hi,

been trying to familiarize myself with GATK and noted a behavior that I think is problematic. Specifically, I am trying to call variants from RNA-seq data using this guide: https://www.broadinstitute.org/gatk/guide/article?id=3891

Part of this processing chain is the GATK "module" SplitNCigarReads . According to the documentation, this module does not accept -nt or -nct arguments to increase parallelism. However, on my system it will greedly consume all CPUs it can see. For a shared environment, this is not really ok , since it will lead to oversubscription of compute resources. For example, assuming 1 CPU, I have launched a pipeline that runs 10 of these jobs in parallel on the same node - so naturally, I am seeing problems related to over-subscribed CPUs.

Is this behavior intended?

Cheers,

Marc

Issue · Github
by Sheila

Issue Number
267
State
open
Last Updated
Assignee
Array
Milestone
Array

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Marc,

    No, GATK itself does not use additional CPUs unless you turn on multithreading. However, Java will use additional CPUs if available for tasks like garbage collection. There are java arguments you can use to reduce or disable the parallel behavior but these are not GATK-specific so we don't document them ourselves. I believe -XX:ParallelGCThreads=1 is one of them.

Sign In or Register to comment.