High-efficiency floating-point neural network inference operators for mobile, server, and Web
翻译 - 适用于移动设备,服务器和Web的高效浮点神经网络推理运算符
#计算机科学#BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
翻译 - BladeDISC 是一个用于机器学习工作负载的端到端动态形状编译器项目。
#计算机科学#The Tensor Algebra SuperOptimizer for Deep Learning
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
#计算机科学#Batch normalization fusion for PyTorch
Optimize layers structure of Keras model to reduce computation time
A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3
Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)
#大语言模型#The blog, read report and code example for AGI/LLM related knowledge.
cross-platform modular neural network inference library, small and efficient
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Pred...
Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢
#计算机科学#A template for getting started writing code using GGML
Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.
#大语言模型#LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.
#计算机科学#Your AI Catalyst: inference backend to maximize your model's inference performance
A constrained expectation-maximization algorithm for feasible graph inference.
🤖️ Optimized CUDA Kernels for Fast MobileNetV2 Inference
#计算机科学#Batch Partitioning for Multi-PE Inference with TVM (2020)
#计算机科学#MLP-Rank: A graph theoretical approach to structured pruning of deep neural networks based on weighted Page Rank centrality as introduced by the related thesis.