This project develops a hybrid recommender system for Chicago Airbnb listings using data from Inside Airbnb. The system recommends listings based on three main preferences:
- 👥 Guest count
- 💲 Price point
- 📝 Text comments
Each recommendation includes the location and distance to the nearest CTA subway station for enhanced accessibility.
- Contains 7,952 listings with attributes like amenities, price, and location.
- Includes 417,795 reviews, used for sentiment analysis.
- Contains 145 train stations, used to compute distances from Airbnb listings.
🔹 Data visualization and attribute relationship analysis.
🔹 Sentiment analysis using VADER and machine learning models (Logistic Regression, Linear SVC, Naive Bayes) to understand guest feedback.
🔹 Hierarchical clustering of Airbnb listings based on name and description embeddings to identify characteristic patterns.
🔹 Build a recommender system uses embeddings and cosine similarity for personalized recommendations and user preferences.
✅ VADER sentiment scores benchmarked against: Logistic Regression, Linear SVC, and Naive Bayes.
📌 Sentiment classes: Positive, Neutral, and Negative.
📌 Final sentiment score computed using a weighted average approach.
📌 Key Steps:
- Clean the review dataset.
- Address dataset skewness (majority of listings have high ratings).
- Utilize sentiment analysis techniques such as VADER to evaluate text sentiment.
- Generate word clouds to visualize sentiment trends.
- Libraries Used: nltk, pandas, matplotlib, wordcloud
✅ Listings were grouped into three clusters based on semantic similarities.
1️⃣ Vibrant Social Spaces: Near entertainment and nightlife.
2️⃣ Cultural and Scenic Attractions: Near museums, parks, and historical sites.
3️⃣ Accessible with Public Transportation: Easy access to CTA train stations.
📌 Key Steps:
- Consolidate data from various sources.
- Process and clean text-based listing information.
- Apply Agglomerative Clustering to group similar listings.
- Visualize clustering results.
- Libraries Used: pandas, numpy, sklearn, spacy, matplotlib, geopy
✅ Implement a content-based recommendation model.
📌 Key Steps:
1️⃣ User Input & Preferences: Collects guest count, price, amenities, and text input to extract relevant features.
2️⃣ Filtering Listings: Filters listings based on user preferences.
3️⃣ Enhancing Rankings: Uses a combined sentiment and cosine similarity score between listings.
4️⃣ Personalized Recommendations: Outputs top-k listings from different clusters.
5️⃣ Geospatial Visualization: Displays results using an interactive Folium map.
- Libraries Used: pandas, numpy, sklearn, folium, spacy, geopandas
✅ Best sentiment analysis model: Linear SVC (92.45% accuracy).
✅ Clustering provided interpretable themes for better recommendations.
✅ Embedding-based similarity scoring improved personalized recommendations.
🔹 Implement dynamic weighting for ranking.
🔹 Integrate transformer-based models for better NLP understanding.
🔹 Expand the system to include real-time updates and user feedback mechanisms.
🔹 🏠 Chicago Airbnb Listings Map: 🔗 📍 Chicago Airbnb Listings Map
🔹 🚇 Chicago Train Stations Map: 🔗 📍 View Chicago CTA Map
👨💻 Author independently developed all aspects of the project including data preprocessing, analysis, model development, and report creation.