GitHub 中文社区
回车: Github搜索    Shift+回车: Google搜索
论坛
排行榜
趋势
登录

©2025 GitHub中文社区论坛GitHub官网网站地图GitHub官方翻译

  • X iconGitHub on X
  • Facebook iconGitHub on Facebook
  • Linkedin iconGitHub on LinkedIn
  • YouTube iconGitHub on YouTube
  • Twitch iconGitHub on Twitch
  • TikTok iconGitHub on TikTok
  • GitHub markGitHub’s organization on GitHub

编程语言

”fp8“ 的搜索结果

DeepGEMM
@deepseek-ai

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python5.51 k
7 天前

相关主题

PyTorchquantizationfluxfp8deepseeksparsitypruninggptqtext-to-image

Google   Bing   GitHub

flux-fp8-api
@aredden

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

diffusionfluxfp8PyTorch
Python269
9 个月前
Microsoft
Tutel
Microsoft@microsoft

#大语言模型#Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek FP8/FP4

PyTorchmoemixture-of-expertsdeepseek大语言模型
C850
8 天前
FP8-Emulation-Toolkit存档
@IntelLabs

PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.

Python110
7 个月前
Intel Corporation
neural-compressor
Intel Corporation@intel

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

low-precisionpruningsparsityauto-tuningknowledge-distillation
Python2.45 k
42 分钟前
FP8-quantization
@Qualcomm-AI-research

Python151
2 年前
Hugging Face
accelerate
Hugging Face@huggingface

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python8.91 k
2 天前
Intel Corporation
ipex-llm-tutorial
Intel Corporation@intel

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm

Jupyter Notebook165
2 个月前
NVIDIA Corporation
TransformerEngine
NVIDIA Corporation@NVIDIA

#计算机科学#A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory ut...

CUDA深度学习gpu机器学习Python
Python2.54 k
5 小时前
cutlass_flash_atten_fp8
@weishengying

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda71
1 年前
diffusers-torchao
@sayakpaul

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).

diffusion-modelsfluxtext-to-imagetorch
Python345
5 个月前