Final project for Havard CS109
The README file gives an overview of what we are handing in: our project notebook, any non-standard Python libraries we used, and URLs to our project websites and screencast videos.
Project Notebook:
Main.ipynb is the project notebook that we hand in. It includes all the analysis we have conducted. (Follow the link for nbviewer version.)
All the data are saved in the raw_data
stop_word.txt: the list of stop words that we used in one LDA analysis.
emoji-data.txt: the table of emoji and what each moji represents
negative-words.txt: the list of negative words
positive-words.txt: the list of positive words
Raw data scrape from Twitter API can be found here: Raw Data on Google Drive
Working dataframe (
) can be found here: Raw Data on Google Drive -
is processed data for LDA I analysis.
Python Library:
pytz: which allow use to convert time to correct timezone.
gensim: word process packages
ast: package that allows use to use
user-defined function
in spark -
tweepy: package that allows use to scrape twitter data
xlrd: package that extracts data from Excel spreadsheets
nltk: Natural Language Toolkit
Project websitres:
Project Screencaste: