Skip to content

blockfer-rp/rl-swarm

 
 

Repository files navigation

RL Swarm

RL Swarm is an open source system for peer-to-peer reinforcement learning over the internet. Running a swarm node allows you to train your personal model against the swarm intelligence. Each swarm performs RL reasoning as a group, with a gossiping system (Hivemind) for collaborative improvement between models. You can also connect your node to the Gensyn Testnet, to receive an on-chain identity that tracks your progress over time.

RL Swarm is fully open and permissionless, meaning you can run it on a basic consumer laptop at home or on a powerful GPU in the cloud. You can also experiment with different models to see which ones perform best.

Requirements

Ensure you that you are using a supported machine/device/environment:

  • arm64 or x86 CPU with minimum 16gb ram

OR

  • CUDA devices (officially supported):
    • RTX 3090
    • RTX 4090
    • A100
    • H100

WITH

  • Python >=3.10 (for Mac, you will likely need to upgrade)

Instructions:

Run the swarm

python3 -m venv .venv
source .venv/bin/activate
./run_rl_swarm.sh

Testnet participation

Please answer 'Y' (or just press enter), N is provided as an alternative flow but isn't currently maintained.

Login

  1. A browser window will pop open (you'll need to manually navigate to http://localhost:3000/ if you're on a VM).
  2. Click 'login'.
  3. Login with your preferred method.

Huggingface

Optionally pair your HF account by using your HF token - more here.

Initial peering and training

From this stage onward your device will be used to train a hyperscale machine learning system. You should see your peer register and vote on-chain here.

Troubleshooting

  • My model doesn't seem to be training?

    • If you're using a consumer device (e.g. a MacBook), it is likely just running slowly - check back in 20 minutes.
  • Logging in with a new account after previous login?

    • Make sure you click 'Logout' on the login screen before you leave your previous session
    • Make sure you delete swarm.pem from the root directory (try sudo rm swarm.pem). If you don't do this, and you previously registered with the peer-id stored in this file, it will disrupt the training process.
  • Issues on VMs?

    • How do I access the login screen if I'm running in a VM?: port forwarding. Add this SSH flag: -L 3000:localhost:3000 when connecting to your VM. E.g. gcloud compute ssh --zone "us-central1-a" [your-vm] --project [your-project] -- -L 3000:localhost:3000
    • Disconnection/general issues: If you are tunneling to a VM and suffer a broken pipe, you will likely encounter OOM or unexepected behaviour the first time you relaunch the script. If you control + c and kill the script it should spin down all background processes. Restart the script and everything should work normally.
  • Issues with npm/general installation?

    • Try npm install -g node@latest
  • OOM errors on MacBook?

    • Try this (experimental) fix to increase memory:
      export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
      
  • I have a Windows machine, can I still train a model on the swarm?: Yes - but this is not very well tested and may require you to do some debugging to get it set up properly. Install WSL and Linux on your Windows machine using the following instructions: https://learn.microsoft.com/en-us/windows/wsl/install

  • I have multiple GPUs on one machine, can I run multiple peers?: Yes - but you'll need to manually change things. You'll need to isolate each GPU, install this repo for each GPU, and expose each peer under a different port to pass the modal onboard.

  • My round/stage is behind the smart contract/other peers?: This is expected behaviour given the different speeds of machines in the network. Once your machine completes it's current round, it will move to the the current round.

  • I want to use a bigger and/or different model in the RL swarm, can I do that?: Yes - but we only recommend doing so if you are comfortable manually changing files and appropriately configuring the model(s) you wish to run for your device(s). You'll simply need to edit the config file in ./hivemind_exp/configs/<directory_relevant_to_your_device>/grpo-qwen-2.5-0.5b-deepseek-r1.yaml to reflect the model_name_or_path and training arguments corresponding to what you want in the swarm. Note that, although any pre-trained LLM compatible with Hugging Face's AutoModelForCausalLM class should work in theory, we have only tested with a handful of Qwen 2.5 instruction-tuned models.

  • I am running a model in the swarm on my CPU, have received a python RuntimeError, and my training progress seems to have stopped.: There are several possible causes for this, but before trying anything please wait long enough to be sure your training actually is frozen and not just slow (e.g., wait longer than a single training iteration has previously taken on your machine). If you're sure training is actually frozen, then some things to try are:

    • Set this (experimental) fix: export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 && ./run_rl_swarm.sh
    • In the config for your device (./hivemind_exp/configs/<directory_relevant_to_your_device>/grpo-qwen-2.5-0.5b-deepseek-r1.yaml) add the following training argument: max_grad_norm=0.5
    • Use floating point 32 instead of bfloat16 to train your model. This can be changed in the config for your device, i.e. ./hivemind_exp/configs/<directory_relevant_to_your_device>/grpo-qwen-2.5-0.5b-deepseek-r1.yaml.

Swarm UI

To launch the Swarm UI, run docker-compose up --build and open 0.0.0.0:8080 in your browser.

See the web/README for more details.

About

A fully open source framework for creating RL training swarms over the internet.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.8%
  • TypeScript 36.4%
  • Shell 4.2%
  • CSS 1.2%
  • JavaScript 0.5%
  • Dockerfile 0.5%
  • HTML 0.4%