nlp-evaluation

Here are 8 public repositories matching this topic...

obss / jury

Comprehensive NLP Evaluation System

python nlp machine-learning natural-language-processing metrics evaluation transformers pytorch datasets evaluate huggingface nlp-evaluation

Updated Aug 8, 2024
Python

lorenzobalzani / advanced_rag

Star

This repo houses experimental projects inspired by insights from the course 'Building and Evaluating Advanced RAG Applications' offered by DeepLearning.AI

nlp ai rag nlp-evaluation

Updated Dec 7, 2023

giuliano-t / llm-financial-regulatory-auditor

Star

A structured evaluation pipeline for LLM-generated outputs in financial supervision contexts. Combines PRA-aligned prompts, thread-type detection, and metric-level meta-review to assess relevance, justification, and actionability across 50+ regulatory and conversational metrics.

regtech openai-api nlp-evaluation financial-nlp llm

Updated Sep 1, 2025
Jupyter Notebook

llm-lab-org / MENA-Values-Benchmark-Evaluating-Cultural-Alignment-and-Multilingual-Bias-in-Large-Language-Models

Star

This repository contains the dataset and code used in our paper, “MENA Values Benchmark: Evaluating Cultural Alignment and Multilingual Bias in Large Language Models.” It provides tools to evaluate how large language models represent Middle Eastern and North African cultural values across 16 countries, multiple languages, and perspectives.

ai-alignment ai-fairness multilingual-nlp nlp-evaluation cultural-ai mena-region token-level-analysis

Updated Jun 3, 2025
Python

xiaojing29 / HallucinationDetection

Star

Detecting hallucinations in LLM-generated answers using cross-checking consistency across models. Implements and extends the SAC3 method to smaller open-source models.

consistency-checking nlp-evaluation llms hallucination-detection sac3

Updated Jun 19, 2025
Jupyter Notebook

zh-betina / zh-betina-UDACITY---NLP-tool

Star

NLP evaluation tool prepared for Udacity Front End Development Nanodegree. First attempt to write unitary tests with Jest. App not working properly anymore due to the trial period of Aylien API, that reached the end.