”rlhf“ 的搜索结果 | GitHub 中文社区

#计算机科学#Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

人工智能 attention-mechanisms 深度学习 reinforcement-learning transformers

Python7.72 k

1 年前

Google Bing GitHub

text-matching reinforcement-learning transformers text-classification large-language-models rlhf reinforcement-learning-from-human-feedback human-feedback deep-learning natural-language-processing

OpenRLHF

@OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

deepspeed transformers vllm large-language-models raylib

Python2.81 k

14 小时前

awesome-RLHF

@opendilab

#计算机科学#A curated list of reinforcement learning with human feedback resources (continually updated)

深度学习 deep-reinforcement-learning human-feedback reinforcement-learning rlhf

3.5 k

6 小时前

MOSS-RLHF

@OpenLMLab

MOSS-RLHF

Python1.3 k

9 个月前

hh-rlhf

@anthropics

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1.63 k

1 年前

safe-rlhf

@PKU-Alignment

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Python1.36 k

6 个月前

alignment-handbook

Hugging Face@huggingface

#大语言模型#Robust recipes to align language models with human and AI preferences

llm rlhf transformers

Python4.75 k

10 天前

RLHF-Reward-Modeling

@RLHFlow

Recipes to train reward model for RLHF.

Python955

13 天前

RLHF

@sunzeyeah

Implementation of Chinese ChatGPT

Python287

1 年前

RLHF

@HumanSignal

Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models

Jupyter Notebook197

1 年前

alpaca_farm

@tatsu-lab

#自然语言处理#A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.

深度学习 instruction-following large-language-models reinforcement-learning-from-human-feedback 自然语言处理

Python762

5 个月前

LLaVA-RLHF

@llava-rlhf

Aligning LMMs with Factually Augmented RLHF

Python324

1 年前

instructGOOSE

@xrsrke

Implementation of Reinforcement Learning from Human Feedback (RLHF)

Jupyter Notebook170

2 年前

ChatGLM-RLHF

@Miraclemarvel55

对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF

Python178

2 年前

trlx

@CarperAI

#计算机科学#A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

机器学习 PyTorch reinforcement-learning

Python4.51 k

1 年前

minChatGPT

@ethanyanjiali

A minimum example of aligning language models with RLHF similar to ChatGPT

Python197

1 年前

LaMDA-rlhf-pytorch

@conceptofmind

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.

Python460

9 个月前

RLHF-Label-Tool

@SupritYoung

用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.

Python241

1 年前

awesome-llm-human-preference-datasets

@glgh

A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.

319

1 年前

Vicuna-LoRA-RLHF-PyTorch

@jackaduma

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Vicuna architecture. Basically ChatGPT...

Python208

6 个月前

transformers_tasks

@HarderThenHarder

#自然语言处理#⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.

自然语言处理 text-classification text-matching information-extraction reinforcement-learning

Jupyter Notebook2.17 k

1 年前🇨🇳

alpaca-rlhf

@l294265421

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

Python97

1 年前

ChatGLM-LoRA-RLHF-PyTorch

@jackaduma

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the ChatGLM architecture. Basically ChatG...

Python120

2 年前

编程语音

Python
Jupyter Notebook
TypeScript