triton-inference-server · GitHub Topics

#大语言模型#Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

gpu-acceleration large-language-models 大语言模型 llm-inference 微服务 nemo rag retrieval-augmented-generation tensorrt triton-inference-server

Python 2.99 k

1 个月前

CoinCheung / BiSeNet

Add bisenetv2. My implementation of BiSeNet

cityscapes PyTorch tensorrt ncnn openvino triton-inference-server ade20k

Python 1.5 k

4 个月前

isarsoft / yolov4-triton-tensorrt

#计算机科学#This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server

triton-inference-server object-detection tensorrt yolov4 yolov4-tiny Docker 深度学习

C++ 283

3 年前

npuichigo / openai_trtllm

#大语言模型#OpenAI compatible API for TensorRT LLM triton backend

langchain 大语言模型 openai-api tensorrt-llm triton-inference-server

Rust 205

8 个月前

torchpipe / torchpipe

Serving Inside Pytorch

部署 inference pipeline-parallelism serving tensorrt triton-inference-server ray PyTorch torch2trt serve llm-serving

C++ 160

2 天前

NetEase-Media / grps

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offeri...

Tensorflow tensorrt torch vllm serving triton-inference-server tensorrt-llm

C++ 157

19 天前

clearml / clearml-serving

#计算机科学#ClearML - Model-Serving Orchestration and Repository Solution

机器学习 mlops DevOps 深度学习 Kubernetes 人工智能 model-serving serving triton triton-inference-server

Python 148

3 个月前

triton-inference-server / onnxruntime_backend

The Triton backend for the ONNX Runtime.

triton-inference-server 后端 onnx-runtime inference

C++ 140

11 天前

kamalkraj / stable-diffusion-tritonserver

#计算机科学#Deploy stable diffusion model with onnx/tenorrt + tritonserver

Docker Nvidia stablediffusion transformers deploy onnx Python triton-inference-server inference 机器学习 PyTorch tensorrt

Jupyter Notebook 123

2 年前

NVIDIA-ISAAC-ROS / isaac_ros_dnn_inference

#计算机科学#NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

ros dnn tensorrt triton triton-inference-server 深度学习 Nvidia 人工智能 ros2-humble ros2 gpu jetson

C++ 112

1 个月前

notAI-tech / fastDeploy

#计算机科学#Deploy DL/ ML inference pipelines with minimal extra code.

深度学习 PyTorch serving falcon gevent Docker model-deployment model-serving http-server gunicorn triton-inference-server Python triton inference-server streaming-audio WebSocket

Python 97

5 个月前

Koldim2001 / TrafficAnalyzer

Анализ трафика на круговом движении с использованием компьютерного зрения

bytetrack Docker Compose Grafana grafana-dashboard hydra multiprocessing oop-principles PostgreSQL traffic-analysis yolov8 multiple-object-tracking object-detection Flask nginx triton-inference-server influxdb Docker

Python 82

1 个月前

trinhtuanvubk / Diff-VC

Diffusion Model for Voice Conversion

diffusion-models gradio triton-inference-server voice-conversion

Jupyter Notebook 51

1 年前

bug-developer021 / YOLOV5_optimization_on_triton

Compare multiple optimization methods on triton to imporve model service performance

gpu inference tensorrt triton-inference-server yolov5

Jupyter Notebook 50

1 年前

akiragy / recsys_pipeline

Build Recommender System with PyTorch + Redis + Elasticsearch + Feast + Triton + Flask. Vector Recall, DeepFM Ranking and Web Application.

elasticsearch Python PyTorch recommender-system Redis inverted-index ranking retrieval recommendation triton-inference-server vector-database Flask

Python 48

2 年前

rtzr / tritony

Tiny configuration for Triton Inference Server

inference mlops triton-inference-server

Python 45

3 个月前

chiehpower / Setup-deeplearning-tools

#计算机科学#Set up CI in DL/ cuda/ cudnn/ TensorRT/ onnx2trt/ onnxruntime/ onnxsim/ Pytorch/ Triton-Inference-Server/ Bazel/ Tesseract/ PaddleOCR/ NVIDIA-docker/ minIO/ Supervisord on AGX or PC from scratch.

tensorrt CUDA cudnn installation onnxruntime PyTorch tesseract-ocr triton-inference-server Nvidia 深度学习持续集成 Docker paddleocr minio

Python 43

2 年前

omarabid59 / yolov8-triton

Provides an ensemble model to deploy a YoloV8 ONNX model to Triton

部署 triton-inference-server ultralytics yolov8

Python 35

1 年前

k9ele7en / Triton-TensorRT-Inference-CRAFT-pytorch

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server -...

triton-inference-server tensorrt onnx PyTorch nvidia-docker inference-engine inference-server inference text-detection

Python 33

4 年前

light-hat / smart_ids

#计算机科学#🧠🛡️ Web service for detecting network attacks in PCAP using ML.

API Django django-rest-framework Docker forensics 机器学习 pcap-analyzer Python triton-inference-server forensics-tools

Python 31

2 个月前