DeepSeek Training with Simulation

This project implements a training pipeline for the DeepSeek language model to generate robot designs using the Gazebo simulator for validation. The system uses Gazebo to evaluate generated robot designs in SDF (Simulation Description Format) and provides feedback for reinforcement learning.

Utils are from Strong Compute, project is built to run on Strong Compute infrastructure. See https://github.com/StrongResearch/isc-demos for more details.

Required Software

Gazebo Simulator Follow installation instructions at https://gazebosim.org/docs/latest/getstarted/
Python Dependencies
```
pip install -r requirements.txt
```

Installation

Clone the repository:

git clone https://github.com/yourusername/deepseek-robot-trainer
cd deepseek-robot-trainer

Install Python dependencies:
```
pip install -r requirements.txt
```
Verify Gazebo installation:
```
gz sim shapes.sdf
```
You should see the Gazebo simulator launch with some basic shapes.

Usage

python train.py --dataset_id /path/to/model --chk_path /path/to/checkpoints

Training Pipeline

Loads the DeepSeek model with LoRA adapters
Generates robot designs in SDF format
Validates designs using Gazebo simulator
Uses simulation success/failure as reward signals
Updates the model using GRPO (Generative Reinforcement Policy Optimization)

Reward Function

The reward function evaluates generated SDF code by:

Extracting code between tags
Writing to a temporary SDF file
Running Gazebo simulation
Checking for simulation errors
Returning +1 for successful simulations, -1 for failures

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
deepseek-r1-llama-8b.isc		deepseek-r1-llama-8b.isc
fdsp_utils.py		fdsp_utils.py
requirements.txt		requirements.txt
simple_robot.sdf		simple_robot.sdf
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSeek Training with Simulation

Required Software

Installation

Usage

Training Pipeline

Reward Function

About

Releases

Packages

Languages

ltejedor/deepseek-training-with-simulation

Folders and files

Latest commit

History

Repository files navigation

DeepSeek Training with Simulation

Required Software

Installation

Usage

Training Pipeline

Reward Function

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages