Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Undocumented use of CPU resources
been trying to familiarize myself with GATK and noted a behavior that I think is problematic. Specifically, I am trying to call variants from RNA-seq data using this guide: https://www.broadinstitute.org/gatk/guide/article?id=3891
Part of this processing chain is the GATK "module" SplitNCigarReads . According to the documentation, this module does not accept -nt or -nct arguments to increase parallelism. However, on my system it will greedly consume all CPUs it can see. For a shared environment, this is not really ok , since it will lead to oversubscription of compute resources. For example, assuming 1 CPU, I have launched a pipeline that runs 10 of these jobs in parallel on the same node - so naturally, I am seeing problems related to over-subscribed CPUs.
Is this behavior intended?