”human-feedback“ 的搜索结果

1.18 k

2 年前

attention-mechanisms chatbot typescript reinforcement-learning langchain large-language-models rlhf human-feedback deep-learning llms

summarize-from-feedback

OpenAI@openai

Code for "Learning to summarize from human feedback"

Python993

1 年前

awesome-RLHF

@opendilab

#计算机科学#A curated list of reinforcement learning with human feedback resources (continually updated)

深度学习 deep-reinforcement-learning human-feedback reinforcement-learning rlhf

3.5 k

21 小时前

trlx

@CarperAI

#计算机科学#A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

机器学习 PyTorch reinforcement-learning

Python4.51 k

1 年前

hh-rlhf

@anthropics

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1.63 k

1 年前

PaLM-rlhf-pytorch

Phil Wang@lucidrains

#计算机科学#Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

artificial-intelligence attention-mechanisms 深度学习 reinforcement-learning transformers

Python7.72 k

10 个月前

safe-rlhf

@PKU-Alignment

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Python1.36 k

6 个月前

rl-teacher

@nottombrown

Code for Deep RL from Human Preferences [Christiano et al]. Plus a webapp for collecting human feedback

Python559

2 年前

pretraining-with-human-feedback

@tomekkorbak

Code accompanying the paper Pretraining Language Models with Human Preferences

Python177

10 个月前

Taming-Stable-Diffusion-with-Human-Ranking-Feedback

@TZW1998

This is the official repo for the paper "Zeroth-Order Optimization Meets Human Feedback: Provable Learning via Ranking Oracles", Tang et al. https://arxiv.org/abs/2303.03751

Jupyter Notebook196

2 年前

instructGOOSE

@xrsrke

Implementation of Reinforcement Learning from Human Feedback (RLHF)

Jupyter Notebook170

2 年前

mumuki-laboratory

@mumuki

🔬 Where students practice and receive automated and human feedback

Ruby203

1 年前

alpaca-rlhf

@l294265421

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

Python97

1 年前

PyMAF

@HongwenZhang

[ICCV 2021, Oral] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop

Python529

2 年前

agenta

@Agenta-AI

#大语言模型#The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM Observability all in one place.

gpt4 langchain llmops Python TypeScript

Python1.31 k

10 小时前

PinpointKit

@Lickability

Send better feedback

Swift1.13 k

3 年前

feedback

@Snipaste

Feedback & wiki for Snipaste https://snipaste.com

3.08 k

1 个月前

TextRL

@voidful

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Python534

7 个月前