inference-optimization · GitHub Topics

High-efficiency floating-point neural network inference operators for mobile, server, and Web

翻译 - 适用于移动设备，服务器和Web的高效浮点神经网络推理运算符

neural-networks inference inference-optimization simd cpu multithreading matrix-multiplication convolutional-neural-networks convolutional-neural-network 神经网络 mobile-inference

C 2 k

2 天前

alibaba / BladeDISC

#计算机科学#BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

翻译 - BladeDISC 是一个用于机器学习工作负载的端到端动态形状编译器项目。

编译器深度学习机器学习 PyTorch Tensorflow inference-optimization mlir 神经网络

C++ 855

3 个月前

jiazhihao / TASO

#计算机科学#The Tensor Algebra SuperOptimizer for Deep Learning

深度学习深度神经网络 inference-optimization

C++ 705

2 年前

mit-han-lab / inter-operator-scheduler

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

inference-optimization cnn parallelism acceleration

C++ 197

3 年前

imedslab / pytorch_bn_fusion

#计算机科学#Batch normalization fusion for PyTorch

PyTorch inference-optimization 深度学习深度神经网络

Python 197

5 年前

ZFTurbo / Keras-inference-time-optimizer

Optimize layers structure of Keras model to reduce computation time

Keras inference-optimization

Python 157

5 年前

Rapternmn / PyTorch-Onnx-Tensorrt

A set of tool which would make your life easier with Tensorrt and Onnxruntime. This Repo is designed for YoloV3

tensorrt onnxruntime onnx PyTorch yolov3 inference-optimization darknet

Python 80

5 年前

BaiTheBest / SparseLLM

Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)

pruning inference-optimization large-language-models model-compression

Python 53

17 天前

keli-wen / AGI-Study

#大语言模型#The blog, read report and code example for AGI/LLM related knowledge.

code-examples Demo inference-optimization 大语言模型

Python 36

2 个月前

lmaxwell / Armednn

cross-platform modular neural network inference library, small and efficient

inference-engine 神经网络 lstm inference-optimization

C++ 13

2 年前

ksm26 / Efficiently-Serving-LLMs

Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Pred...

batch-processing inference-optimization machine-learning-operations model-serving text-generation

Jupyter Notebook 11

1 年前

vbdi / divprune

#大语言模型#[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

inference-optimization 大语言模型 multimodal-large-language-models pruning vision-language-model llava multi-modality

Python 10

12 天前

Harly-1506 / Faster-Inference-yolov8

Faster inference YOLOv8: Optimize and export YOLOv8 models for faster inference using OpenVINO and Numpy 🔢

object-detection openvino segmentation yolov8 图像处理 inference-optimization numpy-arrays OpenCV torch ultralytics

Python 10

4 个月前

grazder / template.cpp

#计算机科学#A template for getting started writing code using GGML

C++ggml 深度学习 inference-optimization

C++ 9

1 年前

ccs96307 / fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

acceleration inference-optimization large-language-models speculative-decoding

Python 7

21 天前

amazon-science / llm-rank-pruning

#大语言模型#LLM-Rank: A graph theoretical approach to structured pruning of large language models based on weighted Page Rank centrality as introduced by the related paper.

graph-theory inference-optimization large-language-models 大语言模型 llms pruning

Python 6

4 个月前