#计算机科学#DeepSpeed Chat: 一键式RLHF训练,让你的类ChatGPT千亿大模型提速省钱15倍
#大语言模型#Run Mixtral-8x7B models in Colab or consumer desktops
#计算机科学#Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
翻译 - pytorch中的去中心化深度学习框架。旨在为全球数千名志愿者训练模型。
Mixture-of-Experts for Large Vision-Language Models
#大语言模型#Optimizing inference proxy for LLMs
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
Codebase for Aria - an Open Multimodal Native MoE
#大语言模型#⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
#自然语言处理#Tutel MoE: An Optimized Mixture-of-Experts Implementation
#计算机科学#Surrogate Modeling Toolbox
#计算机科学#A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)
#计算机科学#A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
#大语言模型#From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
#自然语言处理#中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
#自然语言处理#A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
#计算机科学#Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
#计算机科学#GMoE could be the next backbone model for many kinds of generalization task.
#计算机科学#Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
MoH: Multi-Head Attention as Mixture-of-Head Attention