CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
code-generation dpo large-language-models reinforcement-learning-from-human-feedback llm-as-a-judge codeultrafeedback
-
Updated
Jun 25, 2024 - Python