This project aims to build a robust email spam detection system using machine learning techniques. The primary objective is to classify emails as spam or not spam with high accuracy. We achieved a 95% accuracy using a logistic regression model, enhanced with feature engineering and TF-IDF vectorization.
- df.isnull().sum()
- isnull function identify the null values in the data frame and sum function sum-up, total number of values in the data frame.
- Used LabelEncoder() to convert categorical labels (text) into numeric form.
- LabelEncoder is a class in the sklearn.preprocessing module of scikit-learn.
- Utilized the nltk toolkit to preprocess text data by employing functions such as sent_tokenize and word_tokenize.
- As a result, I enriched my DataFrame with three new columns: number_characters, number_sentences, and number_words.
- Model is being trained using logistic regression.
- Accuracy is used to calculate the efficiency of the model.
- Python 3.x
- Pandas
- Numpy
- Scikit-learn
- Jupyter
- NLTK
- WordCloud
https://github.com/raja045/Email-Spam-Detection-Using-Logistic-Regression.git
cd Email-Spam-Detection-Using-Logistic-Regression
pip install -r requirements.txt
jupyter notebook
- Open
SpamDetection.ipynb
and run the cells sequentially.