KAI Scheduler

KAI Scheduler is a robust, efficient, and scalable Kubernetes scheduler that optimizes GPU resource allocation for AI and machine learning workloads.

Designed to manage large-scale GPU clusters, including thousands of nodes, and high-throughput of workloads, makes the KAI Scheduler ideal for extensive and demanding environments. KAI Scheduler allows administrators of Kubernetes clusters to dynamically allocate GPU resources to workloads.

KAI Scheduler supports the entire AI lifecycle, from small, interactive jobs that require minimal resources to large training and inference, all within the same cluster. It ensures optimal resource allocation while maintaining resource fairness between the different consumers. It can run alongside other schedulers installed on the cluster.

Key Features

Batch Scheduling: Ensure all pods in a group are scheduled simultaneously or not at all.
Bin Packing & Spread Scheduling: Optimize node usage either by minimizing fragmentation (bin-packing) or increasing resiliency and load balancing (spread scheduling).
Workload Priority: Prioritize workloads effectively within queues.
Hierarchical Queues: Manage workloads with two-level queue hierarchies for flexible organizational control.
Resource distribution: Customize quotas, over-quota weights, limits, and priorities per queue.
Fairness Policies: Ensure equitable resource distribution using Dominant Resource Fairness (DRF) and resource reclamation across queues.
Workload Consolidation: Reallocate running workloads intelligently to reduce fragmentation and increase cluster utilization.
Elastic Workloads: Dynamically scale workloads within defined minimum and maximum pod counts.
Dynamic Resource Allocation (DRA): Support vendor-specific hardware resources through Kubernetes ResourceClaims (e.g., GPUs from NVIDIA or AMD).
GPU Sharing: Allow multiple workloads to efficiently share single or multiple GPUs, maximizing resource utilization.
Cloud & On-premise Support: Fully compatible with dynamic cloud infrastructures (including auto-scalers like Karpenter) as well as static on-premise deployments.

Prerequisites

Before installing KAI Scheduler, ensure you have:

A running Kubernetes cluster
Helm CLI installed
NVIDIA GPU-Operator installed in order to schedule workloads that request GPU resources

Installation

KAI Scheduler will be installed in kai-scheduler namespace. When submitting workloads make sure to use a dedicated namespace.

Installation Methods

KAI Scheduler can be installed:

From Production (Recommended)
From Source (Build it Yourself)

Install from Production

helm repo add nvidia-k8s https://helm.ngc.nvidia.com/nvidia/k8s
helm repo update
helm upgrade -i kai-scheduler nvidia-k8s/kai-scheduler -n kai-scheduler --create-namespace --set "global.registry=nvcr.io/nvidia/k8s"

Build from Source

Follow the instructions here

Quick Start

To start scheduling workloads with KAI Scheduler, please continue to Quick Start example

Support and Getting Help

We’d love to hear from you! Here's how to reach out:

Technical Questions, Bugs, and Feature Requests: Please open an issue on GitHub for anything related to technical support, bug reports, or feature suggestions. This helps us track and address them efficiently.
General Discussion & Roadmap Topics: For broader conversations—like roadmap discussions, scheduling strategies, or working group coordination—join the CNCF Slack workspace and drop by the #batch-wg channel.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
build		build
cmd		cmd
deployments		deployments
docs		docs
hack		hack
pkg		pkg
test/e2e		test/e2e
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KAI Scheduler

Key Features

Prerequisites

Installation

Installation Methods

Install from Production

Build from Source

Quick Start

Support and Getting Help

About

Uh oh!

Releases

Packages

Languages

License

v26i/KAI-Scheduler

Folders and files

Latest commit

History

Repository files navigation

KAI Scheduler

Key Features

Prerequisites

Installation

Installation Methods

Install from Production

Build from Source

Quick Start

Support and Getting Help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages