Skip to content

harvardcs109/cs109_final

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cs109_final

Final project for Havard CS109

The README file gives an overview of what we are handing in: our project notebook, any non-standard Python libraries we used, and URLs to our project websites and screencast videos.

Project Notebook:

Main.ipynb is the project notebook that we hand in. It includes all the analysis we have conducted. (Follow the link for nbviewer version.)

Data: All the data are saved in the raw_data folder:

  1. stop_word.txt: the list of stop words that we used in one LDA analysis.

  2. emoji-data.txt: the table of emoji and what each moji represents

  3. negative-words.txt: the list of negative words

  4. positive-words.txt: the list of positive words

  5. Raw data scrape from Twitter API can be found here: Raw Data on Google Drive

  6. Working dataframe (dftokens.csv,dftweets.csv) can be found here: Raw Data on Google Drive

  7. dftokens.xls is processed data for LDA I analysis.

Python Library:

  1. pytz: which allow use to convert time to correct timezone.

  2. gensim: word process packages

  3. ast: package that allows use to use user-defined function in spark

  4. tweepy: package that allows use to scrape twitter data

  5. xlrd: package that extracts data from Excel spreadsheets

  6. nltk: Natural Language Toolkit

Project websitres: http://thanksgivingontwitter.weebly.com/

Project Screencaste: https://www.youtube.com/watch?v=UCImCJhoTgc

About

Final project for Havard CS109

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published