cs109_final

Final project for Havard CS109

The README file gives an overview of what we are handing in: our project notebook, any non-standard Python libraries we used, and URLs to our project websites and screencast videos.

Project Notebook:

Main.ipynb is the project notebook that we hand in. It includes all the analysis we have conducted. (Follow the link for nbviewer version.)

Data: All the data are saved in the raw_data folder:

stop_word.txt: the list of stop words that we used in one LDA analysis.
emoji-data.txt: the table of emoji and what each moji represents
negative-words.txt: the list of negative words
positive-words.txt: the list of positive words
Raw data scrape from Twitter API can be found here: Raw Data on Google Drive
Working dataframe (dftokens.csv,dftweets.csv) can be found here: Raw Data on Google Drive
dftokens.xls is processed data for LDA I analysis.

Python Library:

pytz: which allow use to convert time to correct timezone.
gensim: word process packages
ast: package that allows use to use user-defined function in spark
tweepy: package that allows use to scrape twitter data
xlrd: package that extracts data from Excel spreadsheets
nltk: Natural Language Toolkit

Project websitres: http://thanksgivingontwitter.weebly.com/

Project Screencaste: https://www.youtube.com/watch?v=UCImCJhoTgc

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
raw_data		raw_data
.DS_Store		.DS_Store
.gitignore		.gitignore
Main.ipynb		Main.ipynb
README.md		README.md
lda_topic.png		lda_topic.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cs109_final

About

Releases

Packages

Contributors 2

Languages

harvardcs109/cs109_final

Folders and files

Latest commit

History

Repository files navigation

cs109_final

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages