Skip to content

Latest commit

 

History

History
33 lines (27 loc) · 1.46 KB

README.md

File metadata and controls

33 lines (27 loc) · 1.46 KB

Semantic_Similarity

Given a text and a reason, predict if text satisfies the reason. You can use the train file for any training and report metrics on evaluation file.

Dataset information

  • The CSV files have 3 columns
    • text
    • reason: a short description
    • label:
      • 0: text does not satisfy the reason
      • 1: text satisfies the reason
  • The dataset has been cleaned to a certain extent. You can probe more.

Note: Small train dataset with only positive samples is intentional.

Files

https://drive.google.com/drive/folders/1HInfR5Sspv-k3rMPgJyXjXiJJEoCyOtY?usp=sharing

Solution Below

The python scripts in this repository addresses the issues below. Run on Google colab, script can be foundhere

  1. Required packages
  2. Label class Imbalance
    • Data insights:
      • Baseline approach (use only transformer models)
      • Training approach (use only transformer models)
      • Artificial neg generation techniques.
  3. Metrics
  4. Ablation Study table (different tabular model architecture results comparison)
  5. Fine-tuned the learning rate.
  6. Used a learning rate scheduler.
  7. Used a pre-trained model specifically designed for semantic similarity, such as sentence-transformers/bert-base-nli-mean-tokens.
  8. Insufficient data from data insights analysis