[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Google Bing GitHub
An easy-to-use package for implementing SmoothQuant for LLMs
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Working with SmoothQuant and LLM-AWQ.
Inject errors to LLMs
📖A curated list of Awesome LLM Inference Papers with codes. 🎉🎉