PyTorch native quantization and sparsity for training and inference
A library written in C for converting between float8 (8-bit minifloat numbers) and float32 (single-precision floating-point numbers) formats.
minifloat (8-bit float) in Golang