#大语言模型#A high-throughput and memory-efficient inference and serving engine for LLMs
#大语言模型#The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
翻译 - 轻松进行模型服务
#计算机科学#In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
#计算机科学#Standardized Serverless ML Inference Platform on Kubernetes
#计算机科学#FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on a...
#自然语言处理#LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys...
#大语言模型#Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
🏕️ Reproducible development environment
#大语言模型#AICI: Prompts as (Wasm) Programs
Olares: An Open-Source Sovereign Cloud OS for Local AI
#计算机科学#MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates t...
#计算机科学#Hopsworks - Data-Intensive AI platform with a Feature Store
#计算机科学#The simplest way to serve AI/ML models in production
#大语言模型#A highly optimized LLM inference acceleration engine for Llama and its variants.
#大语言模型#A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
#计算机科学#Model Deployment at Scale on Kubernetes 🦄️
#大语言模型#A throughput-oriented high-performance serving framework for LLMs
#数据仓库#An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
#计算机科学#A scalable inference server for models optimized with OpenVINO™