#大语言模型#Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
#大语言模型#Data processing with ML, LLM and Vision LLM
📚A curated list of Awesome LLM/VLM🔥 Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.
#大语言模型#The goal of RamaLama is to make working with AI boring.
#大语言模型#🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...
#大语言模型#Evaluate your LLM's response with Prometheus and GPT4 💯
#大语言模型#AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
Model swapping for llama.cpp (or any local OpenAPI compatible server)
#大语言模型#Community maintained hardware plugin for vLLM on Ascend
#大语言模型#[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
#大语言模型#Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech/Kokoro FastAPI, and ComfyUI.
#大语言模型#Low latency JSON generation using LLMs ⚡️
#大语言模型#一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型 基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目,采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据...
#大语言模型#A large-scale simulation framework for LLM inference