Predicting-Customer-Engagement-in-Financial-Products-Insights-from-Marketing-Campaigns

1. Project-Overview

This project leverages machine learning to predict whether a customer will subscribe to a bank's term deposit based on data collected from direct marketing campaigns. By analyzing features such as customer demographics, previous interactions, and financial data, we aim to optimize marketing strategies for future campaigns.

This repository contains the complete pipeline from data preprocessing, feature engineering, model building, hyperparameter tuning, and model evaluation.

2. Dataset

The dataset used is from a Portuguese banking institution, consisting of 41,188 instances and 20 features. It contains customer data and outcomes from direct marketing campaigns involving phone calls. Key features include:

Customer Attributes: age, job, marital, education, balance, housing, loan, etc.
Contact Attributes: contact type (telephone, cellular), last contact day, duration, etc.
Previous Campaign Data: pdays, previous, poutcome (outcome of the previous campaign).
Target Variable: subscribed (whether the customer subscribed to a term deposit).

Dataset Preprocessing

Handled missing values using median imputation and default values for categorical features.
Encoded categorical variables using One-Hot Encoding.
Applied Min-Max scaling to normalize continuous features.

3. Exploratory Data Analysis (EDA)

Objective: Identify key patterns and relationships between features and the target variable.

Correlation Matrix: Assessed correlations between numerical features and the target variable.
Univariate and Bivariate Analysis: Visualized distributions of important features (e.g., age, balance) and their relationships with the target.
Class Imbalance: The dataset is highly imbalanced, with only ~11% positive class (i.e., subscribed). Addressed class imbalance using SMOTE (Synthetic Minority Over-sampling Technique).

4. Feature Engineering

Interaction Features: Created new interaction terms between balance and pdays to capture potential non-linear relationships.
Domain-Specific Features: Developed features such as contact rate per campaign and balance-duration ratio.
Temporal Features: Derived features based on the day of the week and time of contact to account for possible temporal effects on subscription likelihood.

5. Model Development

Baseline Models:

Logistic Regression: As a baseline for comparison.
Decision Trees: For interpretable predictions.

Advanced Models:

Random Forest: Robust model for handling non-linear relationships and feature importance analysis.
XGBoost: Gradient boosting for better generalization and handling of imbalanced classes.
CatBoost: Evaluated due to its efficiency in handling categorical features without explicit encoding.

Hyperparameter Tuning:

Utilized GridSearchCV and RandomizedSearchCV for hyperparameter optimization:

Random Forest: Tuned n_estimators, max_depth, and min_samples_split.
XGBoost: Tuned learning_rate, max_depth, n_estimators, and subsample.

Model Evaluation Metrics:

Accuracy: Simple baseline comparison.
Precision: Focus on minimizing false positives in this business context.
Recall: Important to avoid missing potential customers likely to subscribe.
F1-Score: Harmonic mean of precision and recall to balance both.
ROC-AUC Score: Evaluated the model's ability to discriminate between the classes.

Handling Imbalance:

Implemented SMOTE to oversample the minority class and improve recall.
Tested class weights adjustment to further balance precision and recall.

6. Results and Insights

The best-performing model was XGBoost, achieving:
- Accuracy: 90.5%
- Precision: 75.6%
- Recall: 68.3%
- F1-Score: 71.8%
- ROC-AUC: 92.2%
Feature Importance (from XGBoost):
1. duration: The duration of the last contact.
2. pdays: Number of days since the client was last contacted.
3. balance: Customer's account balance.
4. campaign: Number of contacts during the current campaign.
5. job: Customer’s occupation.
The duration of the last contact was the most influential predictor, indicating the importance of engagement time in a successful subscription.

7. Deployment and Next Steps

Model Deployment:

The final model is deployed via Flask API. It accepts customer data as input and returns the likelihood of subscription.
Dockerized the API for easy integration with other banking systems.

Potential Improvements:

Experiment with neural networks to capture more complex patterns in high-dimensional data.
Integrate real-time data to make the model adaptive to changing customer behaviors and market trends.
Implement an A/B testing framework to continuously validate and improve the model in production.

8. Repository Structure

├── data/                     # Dataset and data processing scripts
├── notebooks/                # Jupyter notebooks for EDA and model building
├── models/                   # Saved models and model training scripts
├── app/                      # Flask app for deployment
├── Dockerfile                # Docker configuration
├── README.md                 # Project documentation
└── requirements.txt          # List of dependencies

9. How to Run the Project

Clone the repository:

git clone https://github.com/Gourav052003/Predicting-Customer-Engagement-in-Financial-Products-Insights-from-Marketing-Campaigns.git
cd bank-term-deposit-prediction

Install dependencies:

pip install -r requirements.txt

Run the Jupyter notebook to train models:

jupyter notebook notebooks/Bank_Term_Deposit_Prediction.ipynb

Run the Flask API for predictions:

cd app
python app.py

Key Technical Enhancements:

Detailed descriptions of models, algorithms, and hyperparameter tuning techniques.
Emphasis on dealing with class imbalance using methods like SMOTE and class weight adjustments.
Feature engineering techniques that demonstrate data-driven decision-making.
Comprehensive evaluation metrics showing performance beyond just accuracy, including precision, recall, F1-score, and ROC-AUC.
Future work that hints at more complex methods (e.g., neural networks, real-time predictions) and production considerations (e.g., Dockerization, API deployment).

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
artifacts		artifacts
data		data
k8s-manifests		k8s-manifests
models		models
notebooks		notebooks
results		results
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dockerfile		dockerfile
jenkinsfile		jenkinsfile
requirements.txt		requirements.txt
supervisord.conf		supervisord.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting-Customer-Engagement-in-Financial-Products-Insights-from-Marketing-Campaigns

1. Project-Overview

2. Dataset

Dataset Preprocessing

3. Exploratory Data Analysis (EDA)

4. Feature Engineering

5. Model Development

Baseline Models:

Advanced Models:

Hyperparameter Tuning:

Model Evaluation Metrics:

Handling Imbalance:

6. Results and Insights

7. Deployment and Next Steps

Model Deployment:

Potential Improvements:

8. Repository Structure

9. How to Run the Project

Key Technical Enhancements:

About

Releases

Packages

Languages

License

Gourav052003/Predicting-Customer-Engagement-in-Financial-Products-Insights-from-Marketing-Campaigns

Folders and files

Latest commit

History

Repository files navigation

Predicting-Customer-Engagement-in-Financial-Products-Insights-from-Marketing-Campaigns

1. Project-Overview

2. Dataset

Dataset Preprocessing

3. Exploratory Data Analysis (EDA)

4. Feature Engineering

5. Model Development

Baseline Models:

Advanced Models:

Hyperparameter Tuning:

Model Evaluation Metrics:

Handling Imbalance:

6. Results and Insights

7. Deployment and Next Steps

Model Deployment:

Potential Improvements:

8. Repository Structure

9. How to Run the Project

Key Technical Enhancements:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages