#大语言模型#Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inte...
#大语言模型#Solve Visual Understanding with Reinforced VLMs
Explore the Multimodal “Aha Moment” on 2B Model
Collect every awesome work about r1!
#大语言模型#The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1
#大语言模型#🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, based on DeepSeekRL-Extended.
🐭 A tiny single-file implementation of Group Relative Policy Optimization (GRPO) as introduced by the DeepSeekMath paper
Recreating the minimal training methods of DeepSeek-R1 for small langauge models.
A reinforcement learning agent that learns to solve mazes using Group Relative Policy Optimization (GRPO).
#大语言模型#Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization
A travel agent based on Qwen2.5, fine-tuned by SFT + DPO/PPO/GRPO using traveling question-answer dataset, a mindmap can be output using the response. A RAG system is build upon the tuned qwen2, using...
#算法刷题#Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
PowerPoint slides explaining the paper DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning