📄 Published in the International Journal of Data Management and Data Insights as PyFin-sentiment: Towards a machine-learning-based model for deriving sentiment from financial tweets
My Master Thesis. The actual write up can be found in this other repo
Developing a sentiment analysis model for financial social media posts
There is loads of research on sentiment analysis models for social media posts (Hutto & Gilbert, 2014; Barbierie et al., 2020) and on sentiment analysis of financial texts like news and corporate filings (Loughran & McDonald, 2011; Araci, 2019). However, the research on financial social media posts (think StockTwits, Reddit r/wallstreetbets, and Twitter) is limited.
Researchers often utilize sentiment models from the adjacent domains of finance or generic social media. Therefore, be benchmark the most common models: VADER (Hutto & Gilbert, 2014), NTUSD-Fin (Chen et al., 2018), FinBERT (Araci, 2019), and TwitterRoBERTa (Barbierie et al., 2020)
We collect and label 10,000 tweets and train a varietiy of sentiment analysis models comparing their performance and compute footprints. The detailed methodology can be found here. The final models will be open-sourced and availabe for anyone to use as pyFin-sentiment: a python package for sentiment analysis of financial social media posts.
Out-of-sample ROC AUC of proposed and existing models on the collected dataset of 10,000 tweets.
Out-of-sample ROC AUC of proposed and existing models on a dataset of StockTwits posts.
Using the Fin-SoMe dataset compiled by Chen et al. (2020)
Measured as inference time per sample (ms) on a system with an AMD Ryzen 5 3600 CPU and 64GB of RAM
This work set out to publish a usable model artifact to provide future research with more accurate sentiment assessments. We therefore publish the proposed logistc regression model in an easy-to-use python library called pyFin-Sentiment
- Araci, D. (2019). Finbert: Financial sentiment analysiswith pre-trained language models. arXiv preprint arXiv:1908.10063
- Barbieri, F., Camacho-Collados, J., Neves, L., & Espinosa-Anke, L. (2020). Tweeteval: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2010.12421.
- Chen, C.-C., Huang, H.-H., & Chen, H.-H. (2018). Ntusd-fin: a market sentiment dictionary for financial social media data applications. In Proceedings of the 1st financial narrative processing workshop (fnp 2018).
- Chen, C.-C., Huang, H.-H., & Chen, H.-H. (2020). Issues and perspectives from 10,000 annotated financial social media data. In Proceedings of the 12th language resources and evaluation conference (pp. 6106–6110).
- Hutto, C., &Gilbert, E. (2014). Vader: Aparsimonious rule-based model for sentiment analysis of social media text. InProceedings ofthe international aaai conference on web andsocial media (Vol. 8, pp. 216–225).
- Loughran, T.,&McDonald, B. (2011).When is aliabilitynotaliability? textual analysis, dictionaries, and 10-ks. The Journal offinance, 66(1), 35–65.