direct-preference-optimization

Here are 9 public repositories matching this topic...

liushunyu / awesome-direct-preference-optimization

A Survey of Direct Preference Optimization

alignment large-language-models llm llms large-language-model reinforcement-learning-from-human-feedback direct-preference-optimization

Updated Mar 12, 2025

mlvlab / VidChain

Star

Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025

dense-video-captioning long-video-understanding multimodal-large-language-models direct-preference-optimization aaai2025

Updated Jan 26, 2025
Python

rasyosef / phi-2-sft-and-dpo

Star

Notebooks to create an instruction following version of Microsoft's Phi 2 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch huggingface trl llm supervised-finetuning direct-preference-optimization

Updated Nov 27, 2024
Jupyter Notebook

rasyosef / phi-1_5-instruct

Star

Notebooks to create an instruction following version of Microsoft's Phi 1.5 LLM with Supervised Fine Tuning and Direct Preference Optimization (DPO)

transformers pytorch trl llm supervised-finetuning direct-preference-optimization

Updated Aug 17, 2024

artaasd95 / rap-music-generator

Star

The Rap Music Generator project is an innovative LLM-based tool designed to create rap lyrics. It offers multiple fine-tuning approaches to accommodate diverse rap generation techniques, providing users with a versatile platform for generating unique and stylistically varied content.

python machine-learning huggingface large-language-models supervised-finetuning llm-training direct-preference-optimization

Updated Jan 3, 2025
Jupyter Notebook

eliashornberg / EPFLLaMA

Star

EPFLLaMA: A lightweight language model fine-tuned on EPFL curriculum content. Specialized for STEM education and multiple-choice question answering. Implements advanced techniques like SFT, DPO, and quantization.

natural-language-processing pytorch artificial-intelligence lora large-language-models supervised-finetuning direct-preference-optimization

Updated Sep 15, 2024
Jupyter Notebook

AI-14 / r2gpoallm

Star

[Paper: CC (Springer) 2025] [Official code] - Engaging preference optimization alignment in large language model for continual radiology report generation: A hybrid approach

natural-language-processing bioinformatics deep-learning transformer medical-image-analysis alignment-strategies chest-xrays large-language-models radiology-report-generation direct-preference-optimization

Updated Jan 31, 2025
Python

AliBakly / EPFLLaMA

Star

natural-language-processing artificial-intelligence lora large-language-models supervised-finetuning direct-preference-optimization

Updated Sep 23, 2024
Jupyter Notebook

cluebbers / dpo-rlhf-paraphrase-types

Star

Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.

reinforcement-learning deep-learning transformers alignment paraphrase-generation human-feedback direct-preference-optimization paraphrase-type-generation

Updated Feb 3, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the direct-preference-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the direct-preference-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

direct-preference-optimization

Here are 9 public repositories matching this topic...

liushunyu / awesome-direct-preference-optimization

mlvlab / VidChain

rasyosef / phi-2-sft-and-dpo

rasyosef / phi-1_5-instruct

artaasd95 / rap-music-generator

eliashornberg / EPFLLaMA

AI-14 / r2gpoallm

AliBakly / EPFLLaMA

cluebbers / dpo-rlhf-paraphrase-types

Improve this page

Add this topic to your repo