tensorrt-llm · GitHub Topics

📚A curated list of Awesome LLM/VLM🔥 Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

flash-attention tensorrt-llm vllm llm-inference deepseek deepseek-v3 deepseek-r1

Python 3.82 k

1 天前

collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.

dictation obs openai text-to-speech translation voice-recognition Whisper tensorrt tensorrt-llm whisper-tensorrt

Python 2.69 k

4 天前

shashikg / WhisperS2T

#计算机科学#An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

asr 深度学习 speech-recognition speech-to-text Whisper tensorrt-llm tensorrt vad voice-activity-detection

Jupyter Notebook 388

8 个月前

huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

benchmark onnxruntime openvino PyTorch tensorrt-llm

Python 291

2 个月前

coderonion / awesome-cuda-and-hpc

#大语言模型#🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

CUDA cublas tensorrt Awesome Lists 大语言模型 gpu blas PyTorch hpc gemm llama cudnn triton tensorrt-llm cutlass mlir tvm deepseek ptx

240

4 天前

npuichigo / openai_trtllm

#大语言模型#OpenAI compatible API for TensorRT LLM triton backend

langchain 大语言模型 openai-api tensorrt-llm triton-inference-server

Rust 204

8 个月前

NetEase-Media / grps

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offeri...

Tensorflow tensorrt torch vllm serving triton-inference-server tensorrt-llm

C++ 157

19 天前

NetEase-Media / grps_trtllm

#大语言模型#Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, d...

大语言模型 openai tensorrt-llm chatglm llama3 qwen2 function-call ai-agent llama-index multi-modal deepseek-r1 phi qwq qwen2-vl minicpm-v

Python 128

19 天前

openhackathons-org / End-to-End-LLM

#自然语言处理#This repository is an AI Bootcamp material that consist of a workflow for LLM

深度学习自然语言处理 p-tuning prompt-tuning 大语言模型 question-answering tensorrt-llm genai

Jupyter Notebook 84

8 个月前

vossr / Chat-With-RTX-python-api

#大语言模型#Chat With RTX Python API

大语言模型 llm-inference mistral-7b tensorrt tensorrt-llm

Python 64

4 个月前

guidance-ai / llgtrt

TensorRT-LLM server with Structured Outputs (JSON) built with Rust

guidance openai-api tensorrt-llm cfg JSON Regular expression structured-generation

Rust 47

8 天前

argonne-lcf / LLM-Inference-Bench

#大语言模型#LLM-Inference-Bench

benchmark deepspeed inference llamacpp 大语言模型 tensorrt-llm vllm

Jupyter Notebook 39

3 个月前

fgblanch / OutlookLLM

Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM

tensorrt-llm

Python 37

4 个月前

lix19937 / llm-deploy

#大语言模型#AI Infra LLM infer/ tensorrt-llm/ vllm

大语言模型 llm-inference tensorrt-llm

Python 20

4 个月前

zRzRzRzRzRzRzR / lm-fly

#大语言模型#大模型推理框架加速，让 LLM 飞起来

大语言模型 llm-inference MLX openvino tensorrt-llm vllm

Python 19

1 年前

CactusQ / TensorRT-LLM-Tutorial

Getting started with TensorRT-LLM using BLOOM as a case study

深度学习 Jupyter Notebook llm-inference llms tensorrt tensorrt-llm

Jupyter Notebook 18

1 年前

EdVince / whisper-trtllm

Whisper in TensorRT-LLM

asr CUDA huggingface openai tensorrt tensorrt-llm transformers Whisper

C++ 15

2 年前

Delxrius / MiniMax-01

#大语言模型#MiniMax-01 is a simple implementation of the MiniMax algorithm, a widely used strategy for decision-making in two-player turn-based games like Tic-Tac-Toe. The algorithm aims to minimize the maximum p...

chat-api 聊天机器人 deepseek deepseek-v3 flash-attention 大语言模型 llm-inference minimax tensorrt-llm

2 天前