Skip to content

This repository contains colab file for DistilBERT fine tuned on Google's GoEmotions Dataset

Notifications You must be signed in to change notification settings

kanchanraiii/SentimentAnalysis_GoEmotions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Fine-Tuning DistilBERT on GoEmotions for Sentiment Analysis

This project demonstrates how to fine-tune the DistilBERT model on the GoEmotions dataset to perform sentiment analysis. The notebook provides a step-by-step guide, from data preprocessing to model evaluation.

Table of Contents

Introduction

Sentiment analysis is a crucial task in natural language processing (NLP) that involves determining the emotional tone behind textual data. This project utilizes the GoEmotions dataset, a comprehensive collection of human-annotated Reddit comments categorized into 27 emotion labels, to fine-tune DistilBERT—a smaller, faster, and lighter version of BERT.

Dataset

The GoEmotions dataset consists of approximately 58,000 Reddit comments labeled across 27 emotion categories, including happiness, sadness, anger, and more. This rich dataset enables the development of models capable of nuanced emotion detection.

Model

DistilBERT is a distilled version of BERT, retaining 97% of its language understanding while being 60% faster and lighter. Fine-tuning DistilBERT on the GoEmotions dataset allows for efficient and effective sentiment analysis.

Dependencies

To run the notebook, ensure you have the following dependencies installed:

You can install the required packages using:

pip install transformers datasets torch scikit-learn pandas numpy

References

About

This repository contains colab file for DistilBERT fine tuned on Google's GoEmotions Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published