grpo · GitHub Topics

#大语言模型#Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL2.5,...

大语言模型 lora llama sft deploy multimodal peft internvl liger qwen2-vl qwen2-5 distill rft deepseek-r1 embedding grpo open-r1 llama4

Python 6.9 k

12 小时前

om-ai-lab / VLM-R1

#大语言模型#Solve Visual Understanding with Reinforced VLMs

deepseek-r1 grpo 大语言模型 multimodal vlm qwen reinforcement-learning

Python 4.56 k

2 天前

turningpoint-ai / VisualThinker-R1-Zero

Explore the Multimodal “Aha Moment” on 2B Model

multimodal reasoning grpo reinforcement-learning deepseek deepseek-r1 deepseek-r1-zero

Python 558

25 天前

modelscope / awesome-deep-reasoning

Collect every awesome work about r1!

collection deepseek grpo o1 qwen rl reasoning

Python 333

13 天前

sail-sg / oat

#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

alignment dpo 大语言模型 rlhf distributed-training reasoning grpo ppo

Python 319

5 天前

jianzhnie / Open-R1

#大语言模型#The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

大语言模型 rlhf deepseek-r1 grpo deepseek-v3

Python 252

1 个月前

hustvl / AlphaDrive

Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning

autonomous-driving grpo planning reasoning reinforcement-learning vision-language-model

Python 196

18 天前

yihedeng9 / OpenVLThinker

OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement

grpo rl vision-language-model

Python 69

16 天前

superlinear-ai / microGRPO

🐭 A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper

autograd grpo NumPy reinforcement-learning

Python 30

2 个月前

cnsdqd-dyb / Guide-GRPO

Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.

deepseek-r1 grpo

Python 27

2 个月前

sdiehl / tiny-r1

Recreating the minimal training methods of DeepSeek-R1 for small langauge models.

grpo reasoning

Python 20

2 个月前

kyegomez / OpenR1

An open source implementation of R1

agents 人工智能 china deepseek grpo llms 机器学习 rl

Python 19

2 天前

tyler-romero / microR1

Simple repository for training small reasoning models

deepseek grpo reasoning

Python 12

2 个月前

BY571 / DistRL-LLM

#大语言模型#Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization

大语言模型 llm-training reinforcement-learning grpo pg multi-gpu-inference

Python 11

1 个月前

NJUxlj / Travel-Agent-based-on-Qwen2-RLHF

A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using...

agent dpo lora qwen2 rag rlhf grpo langchain ppo tool-use

Python 9

4 天前