”llm-serving“ 的搜索结果

vllm

@vllm-project

#大语言模型#A high-throughput and memory-efficient inference and serving engine for LLMs

gpt llm PyTorch llmops mlops

Python31.02 k

8 小时前

Google Bing GitHub

inference chatbot llmops model-serving deep-learning gpt llama2 llm-inference llama llm

flashinfer

@flashinfer-ai

FlashInfer: Kernel Library for LLM Serving

Cuda1.49 k

3 天前

lmdeploy

@InternLM

#大语言模型#LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

cuda-kernels deepspeed fastertransformer llm-inference turbomind

Python4.74 k

3 小时前

FlexFlow

@flexflow

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++1.72 k

10 小时前

RouteLLM

@lm-sys

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!

Python3.3 k

4 个月前

punica

@punica-ai

Serving multiple LoRA finetuned LLM as one

Python991

7 个月前

ServerlessLLM

@ServerlessLLM

Serverless LLM Serving for Everyone.

Python364

4 小时前

ppl.llm.serving

@OpenPPL

C++125

3 天前

EasyLM

@young-geng

#自然语言处理#Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.

深度学习 flax jax language-model 自然语言处理

Python2.42 k

4 个月前

DistServe

@LLMServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook368

3 个月前

langcorn

@msoedov

⛓️ Serving LangChain LLM apps and agents automagically with FastApi. LLMops

Python907

4 个月前

llm-app

@pathwaycom

#大语言模型#Dynamic RAG for enterprise. Ready to run with Docker,⚡in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

Chat Bot hugging-face llm llm-local llm-prompting

Python3.93 k

3 个月前

lightllm

@ModelTC

#自然语言处理#LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

深度学习 gpt llama llm model-serving

Python1.93 k

6 个月前