#大语言模型#A high-throughput and memory-efficient inference and serving engine for LLMs
#计算机科学#Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
翻译 - 一个快速简单的框架,用于构建和运行分布式应用程序。 Ray与RLlib(可扩展的强化学习库)和Tune(可扩展的超参数调整库)打包在一起。
#大语言模型#本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
#大语言模型#SGLang is a fast serving framework for large language models and vision language models.
#大语言模型#Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
#计算机科学#SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 14+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
#大语言模型#The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
翻译 - 轻松进行模型服务
#向量搜索引擎#Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
#大语言模型#Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
#大语言模型#AICI: Prompts as (Wasm) Programs
#大语言模型#MoBA: Mixture of Block Attention for Long-Context LLMs
#大语言模型#RayLLM - LLMs on Ray
#大语言模型#A highly optimized LLM inference acceleration engine for Llama and its variants.
#大语言模型#A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
#大语言模型#A throughput-oriented high-performance serving framework for LLMs
#大语言模型#RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
#大语言模型#LLM (Large Language Model) FineTuning
#计算机科学#Efficient AI Inference & Serving
#大语言模型#🧬 Helix is a private GenAI stack for building AI applications with declarative pipelines, knowledge (RAG), API bindings, and first-class testing.
#大语言模型#This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.