”reinforcement-learning-from-human-feedback“ 的搜索结果

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Python1.36 k

6 个月前

reinforcement-learning transformers large-language-models reinforcement-learning-from-human-feedback deep-reinforcement-learning deep-neural-networks human-feedback deep-learning machine-learning artificial-intelligence

hh-rlhf

@anthropics

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1.63 k

1 年前

instructGOOSE

@xrsrke

Implementation of Reinforcement Learning from Human Feedback (RLHF)

Jupyter Notebook170

2 年前

rlhf-book

@natolambert

Textbook on reinforcement learning from human feedback

TeX76

1 个月前

Reinforcement-Learning-from-Human-Feedback

@ksm26

Embark on the "Reinforcement Learning from Human Feedback" course and align Large Language Models (LLMs) with human values.

Jupyter Notebook6

10 个月前

awesome-rlhf

@louieworth

An index of algorithms for reinforcement learning from human feedback (rlhf))

7 个月前

Okapi

@nlp-uoregon

Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback

Python91

1 年前

LLM-RLHF-Tuning-with-PPO-and-DPO

@raghavc

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various conf...

Python120

8 个月前

InstructLLaMA

@michaelnny

Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructG...

Jupyter Notebook46

9 个月前

OpenRLHF

@OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)

deepspeed transformers vllm large-language-models raylib

Python2.8 k

1 天前

awesome-RLHF

@opendilab

#计算机科学#A curated list of reinforcement learning with human feedback resources (continually updated)

深度学习 deep-reinforcement-learning human-feedback reinforcement-learning rlhf

3.48 k

5 天前

summarize-from-feedback

OpenAI@openai

Code for "Learning to summarize from human feedback"

Python992

1 年前

trlx

@CarperAI

#计算机科学#A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

机器学习 PyTorch reinforcement-learning

Python4.51 k

1 年前

alpaca-rlhf

@l294265421

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat

Python97

1 年前

PaLM-rlhf-pytorch

Phil Wang@lucidrains

#计算机科学#Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

人工智能 attention-mechanisms 深度学习 reinforcement-learning transformers

Python7.71 k

10 个月前

DQN-tensorflow

Devsisters Corp.@devsisters

Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning

翻译 - 通过深度强化学习实现人为控制的Tensorflow实现

Python2.49 k

6 年前

TextRL

@voidful

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Python534

7 个月前

cogment-verse

@cogment

Library of Environments, Human Actor UIs and Agent implementation for Human In the Loop Learning & Reinforcement Learning

Python41

2 年前

sutton-barto-rl-exercises

@zyxue

📖Learning reinforcement learning by implementing the algorithms from reinforcement learning an introduction

Jupyter Notebook80

2 年前

ngsim_env

@sisl

Learning human driver models from NGSIM data with imitation learning.

Jupyter Notebook173

5 年前

drl_grasping

@AndrejOrsula

Deep Reinforcement Learning for Robotic Grasping from Octrees

Python352

2 年前

deep-learning-drizzle

Mario@kmario23

#自然语言处理#Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!

翻译 - 通过从这些令人兴奋的演讲中学习，可以深入学习深度学习，强化学习，机器学习，计算机视觉和自然语言学习！

机器学习深度学习深度神经网络 pattern-recognition 机器视觉

HTML12.33 k

1 个月前