fp8 · GitHub Topics

#计算机科学#A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory ut...

CUDA 深度学习 gpu 机器学习 Python PyTorch fp8 jax

Python 2.35 k

2 天前

Azure / MS-AMP

#计算机科学#Microsoft Automatic Mixed Precision Library

amp 深度学习 fp8 gpu PyTorch transformer

Python 591

7 个月前

intel / neural-speed

An innovative library for efficient LLM inference via low-bit quantization

cpu fp8 gpu int8 llm-inference sparsity llamacpp

C++ 351

7 个月前

aredden / flux-fp8-api

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

diffusion flux fp8 PyTorch quantization

Python 256

6 个月前