This project implements a training pipeline for the DeepSeek language model to generate robot designs using the Gazebo simulator for validation. The system uses Gazebo to evaluate generated robot designs in SDF (Simulation Description Format) and provides feedback for reinforcement learning.
Utils are from Strong Compute, project is built to run on Strong Compute infrastructure. See https://github.com/StrongResearch/isc-demos for more details.
-
Gazebo Simulator Follow installation instructions at https://gazebosim.org/docs/latest/getstarted/
-
Python Dependencies
pip install -r requirements.txt
-
Clone the repository:
git clone https://github.com/yourusername/deepseek-robot-trainer cd deepseek-robot-trainer
-
Install Python dependencies:
pip install -r requirements.txt
-
Verify Gazebo installation:
gz sim shapes.sdf
You should see the Gazebo simulator launch with some basic shapes.
python train.py --dataset_id /path/to/model --chk_path /path/to/checkpoints
- Loads the DeepSeek model with LoRA adapters
- Generates robot designs in SDF format
- Validates designs using Gazebo simulator
- Uses simulation success/failure as reward signals
- Updates the model using GRPO (Generative Reinforcement Policy Optimization)
The reward function evaluates generated SDF code by:
- Extracting code between tags
- Writing to a temporary SDF file
- Running Gazebo simulation
- Checking for simulation errors
- Returning +1 for successful simulations, -1 for failures