#大语言模型#Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
#大语言模型#Data processing with ML, LLM and Vision LLM
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
#大语言模型#The goal of RamaLama is to make working with AI boring.
#大语言模型#🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...
#大语言模型#Evaluate your LLM's response with Prometheus and GPT4 💯
#大语言模型#AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
#大语言模型#[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
#大语言模型#Low latency JSON generation using LLMs ⚡️
Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
#大语言模型#A large-scale simulation framework for LLM inference
#大语言模型#Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech/Kokoro FastAPI, and ComfyUI.
#大语言模型#Community maintained hardware plugin for vLLM on Ascend
#大语言模型#The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
#自然语言处理#TopicGPT: A Prompt-Based Framework for Topic Modeling (NAACL'24)