A comprehensive implementation of machine learning algorithms from scratch using only NumPy, Pandas, and Matplotlib. This project aims to provide clear, well-documented implementations to help understand the underlying mechanics of popular ML algorithms without relying on high-level libraries.
- Pure Python implementations with minimal dependencies
- Comprehensive documentation and explanations for each algorithm
- Jupyter notebooks demonstrating real-world applications
- Modular design allowing for easy extension and customization
- Test suite ensuring correctness of implementations
- Visualizations to understand model behavior
Algorithm | Description | Status |
---|---|---|
Linear Regression | Simple and multiple linear regression with gradient descent | β |
Logistic Regression | Binary and multi-class classification | β |
Decision Tree | Classification and regression trees | β |
And more models in the future...
# Clone the repository
git clone https://github.com/ItsTSH/ML-From-Scratch
cd ML-From-Scratch
# Create a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up PYTHONPATH to resolve module imports
# Linux/MacOS:
export PYTHONPATH=$PYTHONPATH:$(pwd)
# Windows (Command Prompt):
set PYTHONPATH=%PYTHONPATH%;%cd%
# Windows (PowerShell):
$env:PYTHONPATH += ";$pwd"
Graphviz (for the decision tree experiment, you can skip this if you dont intend to use the experiment file):
π§ Step 1: Install Graphviz (Windows)
- Go to https://graphviz.org/download/
- Under Windows, click the Stable Release Windows installer (e.g., graphviz-xxx.exe)
- Install it with default settings
π§ Step 2: Add Graphviz to System PATH
After installation:
- Go to: C:\Program Files\Graphviz\bin
- Copy this path
- Open System Environment Variables β Environment Variables
- Under System variables, find and select Path, click Edit
- Click New and paste: C:\Program Files\Graphviz\bin
- Click OK and apply the changes
For permanent setup, add the PYTHONPATH to your environment variables:
Linux/MacOS:
Add to your ~/.bashrc
, ~/.zshrc
, or equivalent:
export PYTHONPATH=$PYTHONPATH:/path/to/ml-from-scratch
Windows:
- Search for "Environment Variables" in Start menu
- Click "Edit the system environment variables"
- Click "Environment Variables"
- Under "User variables" or "System variables", find "PYTHONPATH" or create it
- Add the path to your project directory:
C:\path\to\ml-from-scratch
import numpy as np
from models.linearRegression import LinearRegression
# Create and train a linear regression model
model = LinearRegression(learning_rate=0.01, iterations=1000)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Evaluate the model
mse = model.evaluate(X_test, y_test)
print(f"Mean Squared Error: {mse}")
# Run a linear regression experiment
python experiments/linearRegressionExperiment.py
Navigate to the notebooks/
directory and start Jupyter:
jupyter notebook
Then open the notebook corresponding to the algorithm you're interested in.
ml-from-scratch/
βββ models/ # Core implementations of each model
βββ utils/ # Helper functions and common components
βββ datasets/ # Sample datasets
βββ notebooks/ # Jupyter notebooks to demonstrate each model
βββ tests/ # Unit tests
βββ experiments/ # Scripts to run experiments
βββ visualizations/ # Generated plots and graphs
βββ reports/ # LaTeX reports of models
βββ README.md
βββ requirements.txt
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from utils.preprocessing import train_test_split, normalize
from models.linearRegression import LinearRegression
# Dataset
X, y = make_regression(n_samples=200, n_features=1, noise=15, random_state=1)
X = normalize(X) # Optional normalization
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Model
model = LinearRegression(learning_rate=0.05, n_iters=1000, verbose=True)
model.fit(X_train, y_train)
# Evaluation
results = model.evaluate(X_test, y_test)
print("Evaluation:", results)
# Visualize predictions vs actual data
y_pred_line = model.predict(X)
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color="blue", label="Actual Data", alpha=0.6)
plt.plot(X, y_pred_line, color="red", label="Regression Line", linewidth=2)
plt.xlabel("Feature")
plt.ylabel("Target")
plt.title("Linear Regression - Fit Visualization")
plt.legend()
plt.grid(True)
plt.show()
Run the test suite to verify all algorithms are working correctly:
pytest tests/
Performance comparison with scikit-learn implementations (on a MacBook Pro M1, 16GB RAM):
Algorithm | Our Implementation | scikit-learn | Speed Ratio |
---|---|---|---|
Linear Regression | 0.85s | 0.12s | 7.1x |
Logistic Regression | 1.23s | 0.18s | 6.8x |
Decision Tree | 4.56s | 0.31s | 14.7x |
While our implementations are slower (as expected), they prioritize clarity and educational value over performance optimization.
Contributions are welcome! Please feel free to submit a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by numerous machine learning textbooks and courses
- Special thanks to all contributors who help improve this educational resource
- Dataset sources: UCI Machine Learning Repository and Kaggle
Tejinder Singh Hunjan: tejindersingh0784@gmail.com
Mohit Kamkhalia: mohitkamkhalia@gmail.com