SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
#自然语言处理#Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
翻译 - 使用几行代码将稀疏化配方应用于神经网络的库,实现更快和更小的模型
PyTorch native quantization and sparsity for training and inference
PaddleSlim is an open-source library for deep model compression and architecture search.
翻译 - PaddleSlim是一个用于深度模型压缩和体系结构搜索的开源库。
#计算机科学#A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
翻译 - 新手和高级用户都可以使用一套工具来优化机器学习模型以进行部署和执行。
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
#自然语言处理#Neural Network Compression Framework for enhanced OpenVINO™ inference
#计算机科学#Network Slimming (Pytorch) (ICCV 2017)
More readable and flexible yolov5 with more backbone(gcn, resnet, shufflenet, moblienet, efficientnet, hrnet, swin-transformer, etc) and (cbam,dcn and so on), and tensorrt
[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Caffe for Sparse and Low-rank Deep Neural Networks
An innovative library for efficient LLM inference via low-bit quantization
#计算机科学#Reference ImageNet implementation of SelecSLS CNN architecture proposed in the SIGGRAPH 2020 paper "XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera". The repository also inclu...
Sparse Optimisation Research Code
#计算机科学#Always sparse. Never dense. But never say never. A Sparse Training repository for the Adaptive Sparse Connectivity concept and its algorithmic instantiation, i.e. Sparse Evolutionary Training, to boos...
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
Caffe for Sparse Convolutional Neural Network
#计算机科学#Sparse and structured neural attention mechanisms
#计算机科学#A research library for pytorch-based neural network pruning, compression, and more.