🚛 Production-ready Master Data Table for Drivers in mobility platforms 📊 Validated, Automated & Optimized using SQL, Airflow & Python 🛠️ Designed and implemented with industry-grade data engineering practices
The driverMasterData_database_table
project is a robust, production-level pipeline to generate a single source of truth for all driver-level attributes across multiple raw and processed data systems.
This repository showcases the end-to-end ownership of a critical data table in a real business environment — from validation and data ingestion to automation and fault detection, this project is used to standardize driver data for analytics, alerts, and platform-level logic.
- 🛠️ SQL Indexing for fast query performance
- 🔄 Daily Updates via Airflow DAGs and Cronjobs
- 🔍 Input Validations for schema and value consistency
- 🚨 Triggers for Falacy Detection and anomaly reporting
- 🧩 Modular Design with plug-and-play data inputs
- 📈 Compatible with Slack bots, alert systems, and analytics pipelines
.
├── driverMasterData.ipynb # Core notebook for transformation and integration
├── requirements.txt # All required Python libraries
├── LICENSE # AGPL-3.0 License
├── README.md # You're here!
Layer | Tools/Technologies |
---|---|
💻 Programming | Python, SQL (PostgreSQL/MySQL) |
🔄 Automation | Apache Airflow, CronJob |
🧪 Validation | Pandas, Custom Assertions & Schema Checks |
📤 Reporting | Slack API (for alerts), Email Triggers |
🏷️ Storage | Cloud Data Warehouse / RDS (optional) |
- 🔁 Daily Ingestion of multiple raw input sources (driver metadata, ops history, etc.)
- 🧼 Pre-processing & Validation using
pandas
, ensuring only clean data enters pipeline - 🧮 SQL-Optimized Merge operations into production master table with proper indexing
- 📅 Airflow DAG schedules the job and logs execution metrics
- 🔔 Slack Alerts/Email Triggers notify of anomalies or missing data
git clone https://github.com/SartHak-0-Sach/driverMasterData_database_table.git
cd driverMasterData_database_table
pip install -r requirements.txt
jupyter notebook driverMasterData.ipynb
Add sample snapshots of the input, final table, or Airflow DAG view here.
Developed end-to-end by Sarthak Sachdev as part of the Driver Issues Team at Battery Smart. I owned this project individually — from initial design to production deployment — and used it in real-world data automation and alerting pipelines. Connect with me on LinkedIn
This project is licensed under the AGPL-3.0 License.
If you find this useful or want to reference it for your own production pipeline designs, consider giving it a ⭐!
Let me know if you'd like this converted into a README.md
file directly or pushed to your repo!