Align Anything: Training All-modality Model with Feedback
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
#自然语言处理#tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.
#大语言模型#SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.
#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
Technical anaysis library for .NET
CodeUltraFeedback: aligning large language models to coding preferences
#大语言模型#[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$