GitHub 中文社区

回车: Github搜索 Shift+回车: Google搜索

©2025 GitHub中文社区论坛 GitHub官网网站地图 GitHub官方翻译

GitHub on X
GitHub on Facebook
GitHub on LinkedIn
GitHub on YouTube
GitHub on Twitch
GitHub on TikTok
GitHub’s organization on GitHub

集合主题趋势排行榜

#

llm-serving

Website
Wikipedia

vllm-project / vllm

#大语言模型#vLLM 是一个高效的开源库，用于加速大语言模型推理，通过优化内存管理和分布式处理实现高吞吐量和低延迟。

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python 58.39 k

2 小时前

ray-project / ray

#大语言模型#Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

ray distributed parallel 机器学习 reinforcement-learning 深度学习 Python rllib hyperparameter-search optimization 数据科学 hyperparameter-optimization serving 部署 PyTorch Tensorflow llm-serving large-language-models 大语言模型 llm-inference

Python 38.99 k

2 小时前

liguodongiot / llm-action

#大语言模型#本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

大语言模型 llm-inference llm-serving llm-training llmops

HTML 20.86 k

2 个月前

sgl-project / sglang

#大语言模型#SGLang is a fast serving framework for large language models and vision language models.

CUDA inference llama llava 大语言模型 llm-serving moe PyTorch transformer vlm llama3 deepseek deepseek-v3 deepseek-r1 qwen3 llama4 blackwell openai kimi gpt-oss

Python 18.03 k

2 小时前

bentoml / OpenLLM

#大语言模型#Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

大语言模型 llmops model-inference fine-tuning llm-serving llama vicuna bentoml llama2 llm-inference llm-ops mistral mlops llama3-1

Python 11.79 k

3 天前

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-L...

blackwell CUDA moe PyTorch llm-serving

C++ 11.62 k1

8 小时前

skypilot-org / skypilot

#计算机科学#Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 17+ clouds, or on-prem).

cloud-computing 数据科学深度学习 gpu hyperparameter-tuning 机器学习 tpu job-queue job-scheduler cloud-management distributed-training ml-infrastructure multicloud spot-instances ml-platform cost-management cost-optimization finops llm-serving llm-training

Python 8.73 k

17 小时前

bentoml/BentoML

bentoml / BentoML

#大语言模型#The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

model-serving mlops llmops generative-ai llm-inference model-inference-service inference-platform 深度学习 llm-serving 机器学习 Python multimodal ml-engineering 大语言模型 ai-inference

Python 8.09 k

18 小时前

superduper-io/superduper

superduper-io / superduper

#向量搜索引擎#Superduper: End-to-end framework for building custom AI applications and agents.

人工智能 mlops torch transformers MongoDB Python PyTorch 机器学习数据库 data inference llm-inference pretrained-models 聊天机器人 semantic-search llm-serving llmops vector-search rag

Python 5.21 k

17 天前

gpustack / gpustack

#大语言模型#Simple, scalable AI model deployment on GPU clusters

ascend CUDA deepseek distributed-inference genai inference llama llamacpp 大语言模型 maas metal openai qwen rocm vllm mindie llm-inference llm-serving local-ai heterogeneous-cluster

Python 3.73 k

2 天前

PaddlePaddle / FastDeploy

#大语言模型#High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

serving ernie 大语言模型 inference llm-serving openai vllm ernie-45 ernie-45-vl

Python 3.51 k

1 天前

predibase / lorax

#大语言模型#Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

fine-tuning gpt llama 大语言模型 llm-inference llm-serving llmops lora model-serving PyTorch transformers

Python 3.42 k

4 个月前

microsoft / aici

#大语言模型#AICI: Prompts as (Wasm) Programs

人工智能 Rust WebAssembly wasmtime inference language-model 大语言模型 llm-framework llm-inference llm-serving llmops model-serving transformer

Rust 2.05 k

8 个月前

MoonshotAI / MoBA

#大语言模型#MoBA: Mixture of Block Attention for Long-Context LLMs

flash-attention 大语言模型 llm-serving llm-training moe PyTorch transformer

Python 1.9 k

6 个月前

thu-pacman / chitu

#大语言模型#High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

deepseek gpu 大语言模型 PyTorch llm-serving model-serving

Python 1.28 k

1 天前

ray-project / ray-llm

#大语言模型#RayLLM - LLMs on Ray (Archived). Read README for more info.

ray 大语言模型 llm-serving

Python 1.26 k

6 个月前

vllm-project / vllm-ascend

#大语言模型#Community maintained hardware plugin for vLLM on Ascend

ascend inference 大语言模型 llm-serving llmops mlops model-serving transformer vllm

Python 1.13 k

1 天前

zhihu / ZhiLight

#大语言模型#A highly optimized LLM inference acceleration engine for Llama and its variants.

inference-engine 大语言模型 CUDA gpt llama llm-serving PyTorch llm-inference model-serving deepseek-r1

C++ 899

2 个月前

efeslab / Nanoflow

#大语言模型#A throughput-oriented high-performance serving framework for LLMs

CUDA inference llama2 大语言模型 llm-serving model-serving

Jupyter Notebook 891

2 天前

mosecorg / mosec

#大语言模型#A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

model-serving 深度学习机器学习 nerual-network mlops Hacktoberfest gpu Python PyTorch Tensorflow 大语言模型 jax llm-serving Rust cv mxnet tts

Python 859

18 天前

loading...