dpo

#大语言模型#MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

llama ChatGPT gpt 大语言模型 medical dpo

Python 3.82 k

7 小时前

PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

large-language-models multimodal rlhf chameleon dpo vision-language-model

Jupyter Notebook 3.38 k

3 天前

ContextualAI / HALOs

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment dpo ppo rlhf

Python 830

3 天前

jianzhnie / LLamaTuner

#大语言模型#Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ChatGPT dpo llama3 mixtral ppo qlora qwen rlhf

Python 599

3 个月前

ukairia777 / tensorflow-nlp-tutorial

#自然语言处理#tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

Tensorflow 自然语言处理 question-answering named-entity-recognition bert-ner bert 大语言模型 dpo llama sft huggingface transformers lora trainer

Jupyter Notebook 541

7 个月前

zhaorw02 / DeepMesh

#大语言模型#Official code of DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

3D aigc dpo generative-model 大语言模型 mesh mesh-generation Point cloud

Python 478

7 天前

dvlab-research / Step-DPO

#大语言模型#Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

dpo 大语言模型数学 reasoning

Python 358

3 个月前

sail-sg / oat

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

alignment dpo 大语言模型 rlhf distributed-training reasoning grpo ppo

Python 320

3 天前

TUDB-Labs / mLoRA

#大语言模型#An Efficient "Factory" to Build Multiple LoRA Adapters

baichuan chatglm finetune llama llama2 大语言模型 lora peft gpu dpo rlhf

Python 308

2 个月前

armbues / SiLLM

#大语言模型#SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

apple-silicon dpo large-language-models 大语言模型 llm-inference llm-training lora MLX

Python 262

7 天前

RockeyCoss / SPO

[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

diffusion-models dpo sdxl text-to-image text-to-image-generation

Python 199

10 天前

YangLing0818 / IterComp

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

dpo rlhf text-to-image

Python 180

2 个月前

TideDra / VL-RLHF

#大语言模型#A RLHF Infrastructure for Vision-Language Models

dpo 大语言模型 lmm mllm rlhf vlm

Python 171

5 个月前

argilla-io / notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

dpo fine-tuning Zephyr RTOS

Python 167

1 年前

anilca / NetTrader.Indicator

Technical anaysis library for .NET

bollinger-bands cmf dpo macd momentum pvt sar

C# 141

7 个月前

NiuTrans / Vision-LLM-Alignment

#大语言模型#This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

vision dpo 大语言模型 rlhf sft ppo alignment mllm multi-model llava

Python 105

6 个月前

martin-wey / CodeUltraFeedback

CodeUltraFeedback: aligning large language models to coding preferences

alignment code-generation dpo large-language-models llm-as-a-judge

Python 71

10 个月前

YangLing0818 / SuperCorrect-llm

#大语言模型#[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

大语言模型 reflection dpo

Python 66

25 天前

TianduoWang / DPO-ST

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

chain-of-thought dpo

Python 41

9 个月前

junkangwu / beta-DPO

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

alignment dpo preference-alignment rlhf

Python 41

6 个月前

Website
Wikipedia