This project demonstrates how to use Support Vector Regression (SVR) to predict restaurant total bills using the well-known Tips dataset. It includes preprocessing (label + one-hot encoding), model training, evaluation, and hyperparameter tuning using GridSearchCV.
The Tips dataset from Seaborn includes information about meals in a restaurant and the corresponding tips. It contains both numerical and categorical variables.
Features:
total_bill
: Total cost of the meal (target)tip
: Tip amountsex
: Gender of the customer (Male
/Female
)smoker
: Whether the customer smokes (Yes
/No
)day
: Day of the weektime
: Time of day (Lunch
/Dinner
)size
: Number of people in the party
- pandas
- numpy
- seaborn
- scikit-learn (SVR, GridSearchCV, LabelEncoder, OneHotEncoder, ColumnTransformer)
- matplotlib or seaborn (for optional visualizations)
- Label Encoding: Applied to binary categorical features (
sex
,smoker
,time
) - One-Hot Encoding: Applied to
day
usingColumnTransformer
withdrop='first'
to avoid multicollinearity
X = ['tip', 'sex', 'smoker', 'day', 'time', 'size']
y = total_bill
- Performed before encoding to avoid data leakage
test_size=0.2
,random_state=42
- Model:
SVR()
- Fitted on encoded training data
- Evaluated on test set
Initial SVR Results:
- R-squared Score:
0.5502
- Mean Absolute Error (MAE):
4.41
Using GridSearchCV
This project showcases the application of SVR on real-world-like data, along with:
- Proper feature engineering (label + one-hot encoding)
- Avoiding data leakage
- Hyperparameter tuning for optimization While performance could still be improved with more complex models or feature engineering, this provides a strong foundation for regression modeling with scikit-learn.
Mai3Prabhu