moe · GitHub Topics

#大语言模型#Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

fine-tuning language-model llama 大语言模型 peft transformers rlhf qlora quantization chatglm qwen instruction-tuning mistral gpt lora large-language-models agent 人工智能 moe llama3

Python 46.65 k

2 天前

sgl-project / sglang

#大语言模型#SGLang is a fast serving framework for large language models and vision language models.

CUDA inference llama llava 大语言模型 llm-serving moe PyTorch transformer vlm llama3 llama3-1 deepseek deepseek-llm deepseek-v3 deepseek-r1 deepseek-r1-zero

Python 13.13 k

6 小时前

czy0729 / Bangumi

#安卓#:electron: An unofficial https://bgm.tv ui first app client for Android and iOS, built with React Native. 一个无广告、以爱好为驱动、不以盈利为目的、专门做 ACG 的类似豆瓣的追番记录，bgm.tv 第三方客户端。为移动端重新设计，内置大量加强的网页端难以实现的功能，且提供了相当的自定义选项。...

React Native mobx iOS React Android bangumi design expo moe

TypeScript 4.21 k

18 小时前

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

large-vision-language-model mixture-of-experts moe multi-modal

Python 2.14 k

4 个月前

MoonshotAI / MoBA

#大语言模型#MoBA: Mixture of Block Attention for Long-Context LLMs

flash-attention 大语言模型 llm-serving llm-training moe PyTorch transformer

Python 1.73 k

10 天前

davidmrau / mixture-of-experts

PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

moe mixture-of-experts PyTorch

Python 1.09 k

1 年前

pjlab-sys4nlp / llama-moe

#大语言模型#⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

llama 大语言模型 mixture-of-experts moe

Python 950

4 个月前

microsoft / Tutel

#大语言模型#Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4

PyTorch moe mixture-of-experts deepseek 大语言模型

Python 797

17 小时前

sail-sg / Adan

#计算机科学#Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

bert-model convnext 深度学习 fairseq optimizer resnet timm vit transformer-xl 人工智能 diffusion dreamfusion gpt2 PyTorch cuda-programming llm-training llms moe

Python 785

9 个月前

open-compass / MixtralKit

#大语言模型#A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

大语言模型 mistral moe

Python 767

1 年前

ScienceOne-AI / DeepSeek-671B-SFT-Guide

#大语言模型#An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions. (D...

deepseek-r1 大语言模型 moe sft Python

Python 624

1 个月前

ymcui / Chinese-Mixtral

#自然语言处理#中文Mixtral混合专家大模型（Chinese Mixtral MoE LLMs）

large-language-models 大语言模型 mixtral mixture-of-experts moe 自然语言处理

Python 604

1 年前

mindspore-courses / step_into_llm

#自然语言处理#MindSpore online courses: Step into LLM

大语言模型自然语言处理 large-language-models bert ChatGPT gpt gpt2 instruction-tuning parallel-computing prompt-tuning rlhf chatglm chatglm2 llama llama2 moe peft

Jupyter Notebook 456

3 个月前

kokororin / pixiv.moe

😘 A pinterest-style layout site, shows illusts on pixiv.net order by popularity.

lovelive Pixiv React moe Redux comic comics TypeScript Website Web app

TypeScript 364

2 年前

LISTEN-moe / android-app

#安卓#Official LISTEN.moe Android app

Android music-player japan Anime music moe android-auto Kotlin

Kotlin 262

4 天前

inferflow / inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

llama2 llamacpp llm-inference model-quantization multi-gpu-inference mixture-of-experts moe gemma falcon minicpm mistral bloom deepseek internlm baichuan2 mixtral qwen

C++ 241

1 年前

SkyworkAI / MoH

MoH: Multi-Head Attention as Mixture-of-Head Attention

attention dit llms mixture-of-experts moe transformer vit

Python 235

5 个月前

libgdx / gdx-pay

#安卓#A libGDX cross-platform API for InApp purchasing.

Android iOS iap in-app-purchase moe Java libgdx

Java 227

3 个月前

IBM / ModuleFormer

ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language M...

lm moe

Python 217

1 年前

SkyworkAI / MoE-plus-plus

[ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

large-language-models llms mixture-of-experts moe

Python 201

6 个月前