An innovative library for efficient LLM inference via low-bit quantization
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
#大语言模型#JAX Scalify: end-to-end scaled arithmetics