This project is an advanced algorithm designed to navigate a drone within a simulated room to locate a target. It uses a reinforcement learning approach combined with an interactive environment. The system is built to simulate commands for a Tello Edu drone, with features for re-training the model to adapt to new target positions.
Nb : I highly recommend to turn off any translation tool to have a better reading experience !
- Features
- Main Architecture
- Installation
- Usage
- Modules Overview
- Acknowledgements
- Contacts
- Entire Lore For Interested People
- Room Simulation: Customizable 3D room environment for drone navigation.
- Target Detection: Intelligent algorithm to locate a target in the simulated room.
- Reinforcement Learning: Implements Q-Learning for trajectory optimization.
- Visualization: Real-time 3D trajectory plotting for training and performance monitoring.
- Dynamic Updates: Allows reconfiguration of the target's location with retraining capabilities.
- Replay Mechanism: Replays the best navigation trajectory using generated commands.
Reinforcement_Learning_Navigating_Drone/
│
├── dronecmds.py
├── FunctionsLib.py
├── best_episode_commands.py
├── ChangingTarget.py
├── HowTo
└── README.md
-
dronecmds.py: Provides fundamental drone operations such as movement, positioning, and target detection.
-
FunctionsLib.py: Includes the primary classes and functions for training, reward computation, and command smoothing.
-
best_episode_commands.py: Stores and replays the best episode commands after training.
-
ChangingTarget.py: Let the user use the AI Allows users to update the target position dynamically and retrain the model.
- Clone this repository:
git clone https://github.com/Warukho/Reinforcement-Learning-Navigating-Drone.git cd Reinforcement-Learning-Navigating-Drone
1. Training, Changing Target and Simulation
To use the script, dynamically update the target position and retrain the model:
python ChangingTarget.py
2. Replay the Best Episode
To replay the optimal trajectory after training:
python best_episode_commands.py
dronecmds.py
Core module for drone simulation commands:
createRoom(description, height): Sets up the environment.
createDrone(droneId, viewerId, progfunc): Initializes the drone and its visualization.
locate(x, y, heading): Positions the drone.
takeOff(), land(): Controls the drone's flight.
Movement commands like forward(n), backward(n), goUp(n), goDown(n), etc.
isTargetDetected(): Checks if the drone detects the target.
FunctionsLib.py
Utilities for training, environment setup, and command smoothing:
initialize_settings(): Configures environment and training parameters.
DroneVirtual: Simulates the drone’s behavior in the room.
training_loop(env_with_viewer, num_episodes, max_steps_per_episode): Conducts the reinforcement learning loop.
get_training_results(env_with_viewer): Fetches the best trajectory and commands.
smooth_commands(commands): Optimizes command sequences for efficiency.
best_episode_commands.py
Handles:
Storing the optimal trajectory commands.
Replaying the trained navigation strategy.
ChangingTarget.py
Facilitates:
Dynamic updates to the target position.
Retraining the drone for the new position.
Updating the replay commands for the new trajectory.
Chauvet Pierre : Developed the dronecmds, mplext, viewermpl, viewertk modules for drone operations, and dronecore, images, Tests files.
Bouchaud--Roche Axel : Worked on reinforcement learning and dynamic environment adaptation.
For questions or contributions, please contact:
Chauvet Pierre :
Github
Email : pierre.chauvet@uco.fr
Bouchaud--Roche Axel :
Github
Email : axelbouchaudroche@gmail.com
Here lays a complete and detailed explanation of the process, make sure you have some time before beginning the reading:
The Reinforcement Learning Navigating Drone project was developed as part of a class assignment for the "Algorithmic and Programming 1" course at IMA (Angers, France). The task was to create an algorithm to control a drone in a 3D room and locate a target. While the project was initially designed to be solved with simple loops and logic, I opted to take an advanced approach by integrating Q-learning, a reinforcement learning algorithm, leveraging my prior experience with Python programming.
The original assignment required:
- Designing a straightforward exploration algorithm using basic drone commands such as
forward
,rotateLeft
, andgoUp
and loops. - Implementing and testing the algorithm in Python within a simulated environment.
- Creating a generalized version of the algorithm to adapt to rooms of varying sizes.
Instead of following the conventional approach, I implemented Q-learning to enable the drone to learn optimal navigation strategies autonomously. This decision was driven by my two years of Python experience during high school, which provided the confidence to explore advanced techniques.
Initially, the project environment provided by Pierre Chauvet was incompatible with my setup. To address this, I utilized ChatGPT+, but only for debugging purposes. This tool helped me identify and resolve technical issues efficiently, allowing me to focus on algorithm development without compromising learning objectives.
I chose Q-learning for its:
- Efficiency: The drone could learn from its actions and improve its ability to locate the target.
- Scalability: The algorithm generalized well to different room sizes and layouts without requiring additional hardcoding.
- Advanced Learning: This approach offered an opportunity to deepen my understanding of reinforcement learning while exceeding the assignment’s expectations.
- Improving My Knowledge: Lerning how to use Q-learning was a whole journey through using new Python's libs and Mathematical concepts.
-
Simulated Environment:
- Implemented a 3D coordinate system to define the room and target locations.
- Utilized basic drone commands such as
takeOff
,land
, and directional movements (forward(n)
,rotateLeft(n)
).
-
State and Action Representation:
- States: Represented as the drone's position in the room (x, y, z).
- Actions: Included movement commands.
-
Reward System:
- Positive rewards for reducing the distance to the target.
- High rewards for detecting the target (within 5 cm in this code).
- Penalties for inefficiency, such as revisiting previous states.
-
Dynamic Adaptation:
- Retraining the model for different room dimensions or target locations required no major code modifications.
-
Smoothing Movement:
- The final trajectory of the Drone is smoothed by grouping the commands, it provides a shorter script for the commands, and less visual confusion during the simulation.
- The drone successfully navigated simulated rooms and located targets with high efficiency.
- The algorithm proved adaptable to varying room configurations and target positions.
- The project demonstrated the practical application of reinforcement learning, particularly in balancing exploration vs. exploitation and managing state discretization.
This project went beyond the original requirements and provided an opportunity to explore advanced methodologies in reinforcement learning. While the assignment's scope was introductory, integrating Q-learning allowed me to:
- Push my technical boundaries and apply advanced methods to a real-world problem. By using Q-learning, I transformed a basic task into a meaningful exploration of reinforcement learning.
- Reinforced the importance of persistence in overcoming technical challenges. The use of ChatGPT+ for debugging highlighted the value of leveraging tools responsibly to complement problem-solving and learning.
Overall, this project showcased my ability to adapt and innovate while meeting the course requirements. It was an enriching journey that strengthened both my programming expertise and my understanding of machine learning principles.
Special thanks to Pierre Chauvet for the course framework and project guidance. The project's evolution was driven by both foundational principles and the flexibility to explore advanced concepts.