llm-inference · GitHub Topics

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 73.08 k

24 天前

#大语言模型#Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

翻译 - 一个快速简单的框架，用于构建和运行分布式应用程序。 Ray与RLlib（可扩展的强化学习库）和Tune（可扩展的超参数调整库）打包在一起。

Python 36.52 k

5 小时前

gitleaks / gitleaks

#大语言模型#Gitleaks 是一个开源SAST（静态应用安全测试）命令行工具，用于检测Git 仓库以防止把密码、API 密钥和访问令牌等机密信息硬编码到代码中

安全 Git Go secret gitleaks devsecops Hacktoberfest CI/CD 命令行界面 data-loss-prevention dlp Open Source ai-powered 大语言模型 llm-inference llm-training

Go 19.48 k

10 小时前

liguodongiot / llm-action

#大语言模型#本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

大语言模型 llm-inference llm-serving llm-training llmops

HTML 16.39 k

1 个月前

Lightning-AI / litgpt

#大语言模型#20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

人工智能深度学习 large-language-models 大语言模型 llm-inference llms

Python 11.95 k

2 天前

bentoml / OpenLLM

#大语言模型#Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

大语言模型 llmops model-inference fine-tuning llm-serving llama vicuna bentoml llama2 llm-inference llm-ops mistral mlops llama3-1

Python 11.12 k

2 天前

mistralai / mistral-inference

#大语言模型#Official inference library for Mistral models

大语言模型 llm-inference mistralai

Jupyter Notebook 10.17 k

23 天前

SJTU-IPADS / PowerInfer

#大语言模型#PowerInfer 是一个快速的、可运行在消费级GPU、个人电脑上的大模型服务

large-language-models llama 大语言模型 llm-inference local-inference

C++ 8.17 k

2 个月前

openvinotoolkit / openvino

#自然语言处理#OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

翻译 - OpenVINO™工具包存储库

inference 深度学习 openvino 人工智能机器视觉 diffusion-models generative-ai llm-inference 自然语言处理 performance-boost speech-recognition stable-diffusion deploy-ai optimize-ai transformers yolo recommendation-system good-first-issue

C++ 8.11 k

1 天前

bentoml / BentoML

#大语言模型#The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

翻译 - 轻松进行模型服务

model-serving mlops llmops generative-ai llm-inference 深度学习 llm-serving 机器学习 Python multimodal ml-engineering 大语言模型

Python 7.6 k

16 小时前

InternLM / lmdeploy

#大语言模型#LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

cuda-kernels deepspeed fastertransformer llm-inference turbomind internlm llama 大语言模型 codellama llama2 llama3

Python 6.07 k

2 天前

superduper-io / superduper

#向量搜索引擎#Superduper: End-to-end framework for building custom AI applications and agents.

人工智能 mlops torch transformers MongoDB Python PyTorch 机器学习数据库 data inference llm-inference pretrained-models 聊天机器人 semantic-search llm-serving llmops vector-search rag

Python 5.03 k

3 天前

kserve / kserve

#计算机科学#Standardized Serverless ML Inference Platform on Kubernetes

knative 机器学习 model-interpretability model-serving istio kubeflow 人工智能 Tensorflow PyTorch scikit-learn xgboost Kubernetes service-mesh Hacktoberfest mlops genai llm-inference

Python 4.06 k

4 小时前

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM🔥 Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

flash-attention tensorrt-llm vllm llm-inference deepseek deepseek-v3 deepseek-r1

Python 3.82 k

1 天前

neuralmagic / deepsparse

#自然语言处理#Sparsity-aware deep learning inference runtime for CPUs

机器学习 onnx inference 机器视觉 object-detection pruning quantization pretrained-models 自然语言处理 cpus sparsification llm-inference performance

Python 3.13 k

9 个月前

NVIDIA / GenerativeAIExamples

#大语言模型#Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

gpu-acceleration large-language-models 大语言模型 llm-inference 微服务 nemo rag retrieval-augmented-generation tensorrt triton-inference-server

Python 2.99 k

25 天前

predibase / lorax

#大语言模型#Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

fine-tuning gpt llama 大语言模型 llm-inference llm-serving llmops lora model-serving PyTorch transformers

Python 2.94 k

1 个月前

FellouAI / eko

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

agent agentic-ai agentic-framework agentic-workflow computeruse natural-language-inference workflow rag agents chain-of-thought genai llm-inference llmapi prompt-engineering llm-agents ai-agents browser-automation computer-automation

TypeScript 2.93 k

11 小时前

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

flash-attention gpu CUDA PyTorch llm-inference jit

Cuda 2.63 k

2 天前

databricks / dbrx

#大语言模型#Code examples and resources for DBRX, a large language model developed by Databricks

databricks gen-ai generative-ai 大语言模型 llm-inference llm-training mosaic-ai

Python 2.55 k

1 年前