#

dpo

https://static.github-zh.com/github_avatars/shibing624?size=40

#大语言模型#MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

Python 3.82 k
7 小时前
https://static.github-zh.com/github_avatars/PKU-Alignment?size=40
Jupyter Notebook 3.38 k
3 天前
https://static.github-zh.com/github_avatars/ContextualAI?size=40

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

Python 830
3 天前
https://static.github-zh.com/github_avatars/jianzhnie?size=40

#大语言模型#Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

Python 599
3 个月前
https://static.github-zh.com/github_avatars/ukairia777?size=40

#自然语言处理#tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

Jupyter Notebook 541
7 个月前
https://static.github-zh.com/github_avatars/zhaorw02?size=40

#大语言模型#Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Python 478
7 天前
https://static.github-zh.com/github_avatars/dvlab-research?size=40

#大语言模型#Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Python 358
3 个月前
https://static.github-zh.com/github_avatars/sail-sg?size=40

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

Python 320
3 天前
https://static.github-zh.com/github_avatars/TUDB-Labs?size=40
Python 308
2 个月前
https://static.github-zh.com/github_avatars/armbues?size=40

#大语言模型#SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

Python 262
7 天前
https://static.github-zh.com/github_avatars/RockeyCoss?size=40

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

Python 199
10 天前
https://static.github-zh.com/github_avatars/YangLing0818?size=40

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Python 180
2 个月前
https://static.github-zh.com/github_avatars/TideDra?size=40

#大语言模型#A RLHF Infrastructure for Vision-Language Models

Python 171
5 个月前
https://static.github-zh.com/github_avatars/argilla-io?size=40

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

Python 167
1 年前
https://static.github-zh.com/github_avatars/anilca?size=40
C# 141
7 个月前
https://static.github-zh.com/github_avatars/NiuTrans?size=40

#大语言模型#This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

Python 105
6 个月前
https://static.github-zh.com/github_avatars/martin-wey?size=40

CodeUltraFeedback: aligning large language models to coding preferences

Python 71
10 个月前
https://static.github-zh.com/github_avatars/YangLing0818?size=40

#大语言模型#[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

Python 66
25 天前
https://static.github-zh.com/github_avatars/TianduoWang?size=40

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Python 41
9 个月前
https://static.github-zh.com/github_avatars/junkangwu?size=40

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Python 41
6 个月前
loading...
Website
Wikipedia