Fast inference engine for Transformer models
用于谷歌Gemma模型的轻量级独立C++推理引擎。
FeatherCNN is a high performance inference engine for convolutional neural networks.
Inference engine powering open source models on OpenRouter
Prototype type inference engine
Highly optimized inference engine for Binarized Neural Networks
Stock inference engine using Spring XD, Apache Geode / GemFire and Spark ML Lib.
This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
Tensorflow Plugin for Unreal Engine using C API for inference focus.
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
LLaMA模型的推理代码
Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server -...
Speech-to-Text interface for Emacs using OpenAI's whisper model and whisper.cpp as inference engine.