prime-rl is a codebase for decentralized RL training at scale.
quick install
curl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/install.sh | bash
- Clone:
git clone git@github.com:PrimeIntellect-ai/prime-rl.git
cd prime-rl
- Install
uv
:
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
- Set up the environment:
uv venv --python 3.10
source .venv/bin/activate
uv sync
uv pip install flash-attn --no-build-isolation
- Precommit install
uv run pre-commit install
- Test
uv run pytest
- debug run
training
uv run torchrun --nproc_per_node=2 src/zeroband/train.py @ configs/training/debug.toml
inference
uv run python src/zeroband/inference.py @ configs/inference/debug.toml
on two different terminal do:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
export VLLM_WORKER_MULTIPROC_METHOD=spawn
uv run python src/zeroband/inference.py @ configs/inference/Qwen1.5B/debug_math.toml
then start the trainer
ulimit -n 4096
export CUDA_VISIBLE_DEVICES=6,7
uv run torchrun --nproc_per_node=2 src/zeroband/train.py @ configs/training/Qwen1.5B/debug_math.toml
on two different terminal do:
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5
export VLLM_WORKER_MULTIPROC_METHOD=spawn
uv run python src/zeroband/inference.py @ configs/inference/Qwen1.5B/Qwen1.5B.toml
then start the trainer
ulimit -n 4096
export CUDA_VISIBLE_DEVICES=6,7
uv run torchrun --nproc_per_node=2 src/zeroband/train.py @ configs/training/Qwen1.5B/Qwen1.5b.toml
if running on h100 node instead of H200 you should add --train.micro_bs 4