-
Notifications
You must be signed in to change notification settings - Fork 314
(3.9.0‐3.9.1) Default ThreadsPerCore Slurm setting causes reduced CPU utilization
ParallelCluster does not explicitly set the ThreadsPerCore
for compute node configuration causing Slurm to use the default value of 1. Slurm v23.11 introduced a change that requires the ThreadsPerCore
setting to match the threads per physical core of the underlying instance. For compute resources where multi-threading has not been disabled, this will result in CPU under utilization at around 50%.
- ParallelCluster 3.9.0, 3.9.1
- Slurm 23.11.4
- All operating systems supported by ParallelCluster
To mitigate the issue, it is recommended to set ThreadsPerCore
value using the CustomSlurmSettings property of each compute resource in your cluster configuration where multi-threading is enabled (which is the default).
The steps are as follows:
- For each compute resource where multi-threading is enabled, add the following section:
CustomSlurmSettings:
ThreadsPerCore: <default-threads-per-core>
Note: You can determine the
default-threads-per-core
of the instance type by running this command:aws ec2 describe-instance-types --instance-types <instance-type> --region <region> | grep DefaultThreadsPerCore
- Update your existing clusters or create new clusters using the updated configuration file for changes to take effect by following the instructions here
Note: Please notice that if your system is configured with more than one thread per core, execution of a different job on each thread is not supported. However a job can execute a one task per thread from within one job step or execute a distinct job step on each of the threads. This is reported in the official Slurm doc here.