[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
[Paper][ACL 2024 Findings] Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
DPO-Shift: Shifting the Distribution of Direct Preference Optimization
Survey of preference alignment algorithms
#大语言模型#Generate synthetic datasets for instruction tuning and preference alignment using tools like `distilabel` for efficient and scalable data creation.