TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficientl...
The Triton TensorRT-LLM Backend
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
Mixed precision inference by Tensorrt-LLM
OpenAI compatible API for TensorRT LLM triton backend
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream depl...
TensorRT LLM Benchmark Configuration
📖A curated list of Awesome LLM Inference Papers with codes. 🎉🎉
#计算机科学#ONNX-TensorRT: TensorRT backend for ONNX
翻译 - ONNX-TensorRT:用于ONNX的TensorRT后端
TensorFlow/TensorRT integration
TensorRT for Yolov3
翻译 - TensorRT for Yolov3
#计算机科学#NVIDIA®TensorRT™是一款用于在NVIDIA GPU上进行高性能深度学习推理的SDK。此存储库包含TensorRT的开源组件。
Simple samples for TensorRT programming
YOLOv5 in TensorRT
YOLOv8 TensorRT C++ Implementation
C++ library based on tensorrt integration
TensorRT C++ API Tutorial
TensorRT Plugin Autogen Tool
PyTorch ,ONNX and TensorRT implementation of YOLOv4
翻译 - YOLOv4的PyTorch,ONNX和TensorRT实现
Universal and Transferable Attacks on Aligned Language Models