vllm · GitHub Topics

#大语言模型#Llama 2 微调/推理方法和示例

人工智能 finetuning langchain llama llama2 大语言模型机器学习 Python PyTorch vllm

Jupyter Notebook 17.05 k

2 天前

#大语言模型#Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any...

ggml PyTorch chatglm 部署 flan-t5 大语言模型 wizardlm 人工智能机器学习 Whisper inference openai-api mistral gemma llama llamacpp vllm qwen llama3 glm4

Python 7.46 k

1 天前

OpenRLHF / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

transformers vllm large-language-models raylib reinforcement-learning-from-human-feedback reinforcement-learning openai-o1 proximal-policy-optimization

Python 6.21 k

13 小时前

katanaml / sparrow

#大语言模型#Data processing with ML, LLM and Vision LLM

机器学习 huggingface-transformers 自然语言处理机器视觉 gpt 大语言模型 rag vllm

Python 4.47 k

2 天前

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM🔥 Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

flash-attention tensorrt-llm vllm llm-inference deepseek deepseek-v3 deepseek-r1

Python 3.82 k

1 天前

containers / ramalama

#大语言模型#The goal of RamaLama is to make working with AI boring.

人工智能 containers inference-server llamacpp podman vllm 大语言模型

Python 1.52 k

10 小时前

bricks-cloud / BricksLLM

#大语言模型#🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI...

Go 大语言模型 openai 人工智能 anthropic Azure gpt PostgreSQL REST API ycombinator API Docker 隐私安全 generative-ai Open Source 自托管 vllm

Go 1.03 k

3 个月前

prometheus-eval / prometheus-eval

#大语言模型#Evaluate your LLM's response with Prometheus and GPT4 💯

evaluation 大语言模型 llmops Python vllm gpt4 llm-as-a-judge

Python 901

1 个月前

substratusai / kubeai

#大语言模型#AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

Kubernetes 大语言模型 openai-api autoscaler ollama vllm ollama-operator vllm-operator 人工智能 Whisper faster-whisper

Go 881

2 天前

harleyszhang / llm_note

#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

大语言模型 llm-inference vllm cuda-programming kv-cache transformer-models

Python 705

18 小时前

mustafaaljadery / llama3v

A SOTA vision model built on top of llama3 8B.

llama llama3 vllm

Python 586

10 个月前

jakobdylanc / llmcord

#前端开发#Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)

gpt openai gpt-4 Discord 聊天机器人大语言模型 Bot ollama gpt-4o llama3 llama mistral groq xai grok vllm 前端 chat

Python 522

7 天前

mostlygeek / llama-swap

Model swapping for llama.cpp (or any local OpenAPI compatible server)

Go llama llamacpp localllama localllm openai openai-api vllm

Go 506

5 天前

vllm-project / vllm-ascend

#大语言模型#Community maintained hardware plugin for vLLM on Ascend

ascend inference 大语言模型 llm-serving llmops mlops model-serving transformer vllm

Python 452

3 天前

ModelTC / llmc

#大语言模型#[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

部署大语言模型 pruning quantization 工具 benchmark evaluation large-language-models internlm2 llama3 smoothquant post-training-quantization mixtral vllm

Python 452

3 天前

ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

gptq peft quantization transformers vllm

Python 447

19 小时前

varunvasudeva1 / llm-server-docs

#大语言模型#Documentation on setting up an LLM server on Debian from scratch, using Ollama/vLLM, Open WebUI, OpenedAI Speech/Kokoro FastAPI, and ComfyUI.

Linux 大语言模型 ollama Server open-webui Debian comfyui vllm

421

13 天前

varunshenoy / super-json-mode

#大语言模型#Low latency JSON generation using LLMs ⚡️

huggingface-transformers 大语言模型 openai vllm

Jupyter Notebook 398

1 年前

apconw / sanic-web

#大语言模型#一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen2.5等大模型基于 Dify 、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目，采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据...

人工智能 bigdata ChatGPT dify ollama rag vllm chat 大语言模型 qwen echarts sanic text2sql Vue.js Python deepseek-r1

JavaScript 362

17 天前

microsoft / vidur

#大语言模型#A large-scale simulation framework for LLM inference

inference 大语言模型 Simulation transformer vllm

Python 361

5 个月前