Comprehensive NLP Evaluation System
-
Updated
Aug 8, 2024 - Python
Comprehensive NLP Evaluation System
This repo houses experimental projects inspired by insights from the course 'Building and Evaluating Advanced RAG Applications' offered by DeepLearning.AI
A structured evaluation pipeline for LLM-generated outputs in financial supervision contexts. Combines PRA-aligned prompts, thread-type detection, and metric-level meta-review to assess relevance, justification, and actionability across 50+ regulatory and conversational metrics.
This repository contains the dataset and code used in our paper, “MENA Values Benchmark: Evaluating Cultural Alignment and Multilingual Bias in Large Language Models.” It provides tools to evaluate how large language models represent Middle Eastern and North African cultural values across 16 countries, multiple languages, and perspectives.
Detecting hallucinations in LLM-generated answers using cross-checking consistency across models. Implements and extends the SAC3 method to smaller open-source models.
NLP evaluation tool prepared for Udacity Front End Development Nanodegree. First attempt to write unitary tests with Jest. App not working properly anymore due to the trial period of Aylien API, that reached the end.
LLM behavior QA: tone collapse, false consent, and reroute logic scoring.
Framework for testing Generative Dialog Models (GDMs)
Add a description, image, and links to the nlp-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the nlp-evaluation topic, visit your repo's landing page and select "manage topics."