SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
A script to convert floating-point CNN models into generalized low-precision ShiftCNN representation
#大语言模型#JAX Scalify: end-to-end scaled arithmetics
Code for DNN feature map compression paper
CUDA/HIP header-only library that streamlines working with vector and low-precision floating-point types (16 bit, 8 bit) on GPUs
#自然语言处理#LinearCosine: Adding beats multiplying for lower-precision efficient cosine similarity