”fp8“ 的搜索结果

DeepGEMM

@deepseek-ai

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python5.51 k

7 天前

Google Bing GitHub

flux-fp8-api

@aredden

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

diffusion flux fp8 PyTorch

Python269

9 个月前

Tutel

Microsoft@microsoft

#大语言模型#Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4

PyTorch moe mixture-of-experts deepseek 大语言模型

C850

8 天前

FP8-Emulation-Toolkit存档

@IntelLabs

PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.

Python110

7 个月前

neural-compressor

Intel Corporation@intel

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

low-precision pruning sparsity auto-tuning knowledge-distillation

Python2.45 k

42 分钟前

FP8-quantization

@Qualcomm-AI-research

Python151

2 年前

accelerate

Hugging Face@huggingface

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python8.91 k

2 天前

ipex-llm-tutorial

Intel Corporation@intel

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm

Jupyter Notebook165

2 个月前

TransformerEngine

NVIDIA Corporation@NVIDIA

#计算机科学#A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory ut...

CUDA 深度学习 gpu 机器学习 Python

Python2.54 k

5 小时前