通义千问VLLM推理部署DEMO
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
Custom Websearch Agent Built with Local Models, vLLM, and OpenAI
Support mixed-precsion inference with vllm
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
A fork of github.com/vllm-project/vllm
An Open-source Toolkit for LLM Development
#大语言模型#InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
演示 vllm 对中文大语言模型的神奇效果
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
#大语言模型#Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such ...
An open-source educational chat model from ICALK, East China Normal University. 开源中英教育对话大模型。(通用基座模型,GPU部署,数据清理) 致敬: LLaMA, MOSS, BELLE, Ziya, vLLM