This project analyzes the sentiment of tweets about U.S. airlines using the Twitter US Airline Sentiment dataset from Kaggle. The analysis covers data cleaning, exploratory data analysis, feature extraction, model training, and evaluation.
The goal of this project is to classify tweets into sentiment categories (positive, negative, and neutral) using machine learning , NLP. The workflow includes:
- Data cleaning and preprocessing
- Feature engineering (e.g., TF-IDF)
- Model training with classifiers (e.g., Logistic Regression, Random Forest)
- Evaluation of model performance using common metrics
The dataset used is the "Twitter US Airline Sentiment" dataset available on Kaggle. It contains tweets, their sentiment labels, and additional metadata.
- Download Link: Kaggle Dataset
For more details on the dataset, refer to the Kaggle page.
- Data Preprocessing:
Clean the text data by removing noise (punctuation, stop words, etc.) and normalize the tweets. - Feature Engineering:
Transform text data into numerical features using techniques like TF-IDF. - Modeling:
Train machine learning models (e.g., Logistic Regression, Random Forest) on the processed data. - Evaluation:
Evaluate the models using accuracy, precision, recall, and F1-score. - Visualization:
Use libraries like Matplotlib and Seaborn to visualize sentiment distributions and model performance.
-
Clone the Repository:
git clone https://github.com/Naso7y/twitter-sentiment-analysis.git cd twitter-sentiment-analysis
-
Set Up a Virtual Environment (Optional but Recommended):
python -m venv env source env/bin/activate # On Windows: env\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
The
requirements.txt
includes essential libraries such as: -
Download the Dataset: Download the dataset from Kaggle and place the CSV file into the
data/
folder. -
Download spaCy Model:
python -m spacy download en_core_web_sm
- Run the Analysis Notebook:
Navigate to the
notebooks/
directory and open the Jupyter Notebook:jupyter notebook Twitter_Sentiment_Analysis.ipynb
- Follow the Notebook Steps: The notebook guides you through data preprocessing, model training, evaluation, and visualization.
twitter-sentiment-analysis/
├── Twitter_Sentiment_Analysis.ipynb # Jupyter Notebook for analysis
├── requirements.txt # List of required Python libraries
└── README.md
- Kaggle Dataset: Twitter US Airline Sentiment
- pandas Documentation: pandas
- scikit-learn Documentation: scikit-learn
- spaCy Documentation: spaCy
- Matplotlib Documentation: Matplotlib
I welcome all contributions! Feel free to fork the repository, submit issues, or create pull requests.
For any questions or feedback, feel free to reach out:
- GitHub: NASO7Y
- Email: ahmed.noshy2004@gmail.com
- LinkedIn: Ahmed Noshy
⭐ If you find this project helpful, consider giving it a star is support😂🌹