SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
PyTorch native quantization and sparsity for training and inference
PaddleSlim is an open-source library for deep model compression and architecture search.
翻译 - PaddleSlim是一个用于深度模型压缩和体系结构搜索的开源库。