Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
Implementation of Reinforcement Learning from Human Feedback (RLHF)
Textbook on reinforcement learning from human feedback
Embark on the "Reinforcement Learning from Human Feedback" course and align Large Language Models (LLMs) with human values.
An index of algorithms for reinforcement learning from human feedback (rlhf))
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various conf...
Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructG...
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention)
#计算机科学#A curated list of reinforcement learning with human feedback resources (continually updated)
Code for "Learning to summarize from human feedback"
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
#计算机科学#Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning
翻译 - 通过深度强化学习实现人为控制的Tensorflow实现
Library of Environments, Human Actor UIs and Agent implementation for Human In the Loop Learning & Reinforcement Learning
📖Learning reinforcement learning by implementing the algorithms from reinforcement learning an introduction
Deep Reinforcement Learning for Robotic Grasping from Octrees
#自然语言处理#Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
翻译 - 通过从这些令人兴奋的演讲中学习,可以深入学习深度学习,强化学习,机器学习,计算机视觉和自然语言学习!
[T-ITS] Driving Behavior Modeling using Naturalistic Human Driving Data with Inverse Reinforcement Learning
Code from the Deep Reinforcement Learning in Action book from Manning, Inc
Project for paper "Learning 3D Human Dynamics from Video"
翻译 - 论文项目“从视频中学习3D人体动力学”
TensorFlow Reinforcement Learning
翻译 - TensorFlow强化学习
Deep Reinforcement Learning