Skip to content

Given a text and a reason, predict if text satisfies the reason. You can use the train file for any training and report metrics on evaluation file.

Notifications You must be signed in to change notification settings

Adesoji1/Semantic_Similarity

Repository files navigation

Semantic_Similarity

Given a text and a reason, predict if text satisfies the reason. You can use the train file for any training and report metrics on evaluation file.

Dataset information

  • The CSV files have 3 columns
    • text
    • reason: a short description
    • label:
      • 0: text does not satisfy the reason
      • 1: text satisfies the reason
  • The dataset has been cleaned to a certain extent. You can probe more.

Note: Small train dataset with only positive samples is intentional.

Files

https://drive.google.com/drive/folders/1HInfR5Sspv-k3rMPgJyXjXiJJEoCyOtY?usp=sharing

Solution Below

The python scripts in this repository addresses the issues below. Run on Google colab, script can be foundhere

  1. Required packages
  2. Label class Imbalance
    • Data insights:
      • Baseline approach (use only transformer models)
      • Training approach (use only transformer models)
      • Artificial neg generation techniques.
  3. Metrics
  4. Ablation Study table (different tabular model architecture results comparison)
  5. Fine-tuned the learning rate.
  6. Used a learning rate scheduler.
  7. Used a pre-trained model specifically designed for semantic similarity, such as sentence-transformers/bert-base-nli-mean-tokens.
  8. Insufficient data from data insights analysis

About

Given a text and a reason, predict if text satisfies the reason. You can use the train file for any training and report metrics on evaluation file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published