cutlass · GitHub Topics

xlite-dev / CUDA-Learn-Notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

CUDA gemm cuda-kernels cuda-programming cudnn cutlass flash-attention

Cuda 3.4 k

20 小时前

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

cutlass PyTorch CUDA gpu

C++ 864

12 天前

coderonion / awesome-cuda-and-hpc

#大语言模型#🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

CUDA cublas tensorrt Awesome Lists 大语言模型 gpu blas PyTorch hpc gemm llama cudnn triton tensorrt-llm cutlass mlir tvm deepseek ptx

240

4 天前

DD-DuDa / Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

CUDA cutlass gpu

Makefile 154

2 个月前

leimao / CUTLASS-Examples

CUTLASS and CuTe Examples

CUDA cutlass Docker

Cuda 47

3 个月前

Bruce-Lee-LY / flash_attention_inference

#大语言模型#Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

CUDA flash-attention gpu inference large-language-model 大语言模型 Nvidia cutlass mha

C++ 35

2 个月前

YashasSamaga / ConvolutionBuildingBlocks

#计算机科学#GEMM and Winograd based convolutions using CUTLASS

深度学习 convolution CUDA cutlass

Cuda 26

5 年前

yester31 / Cutlass_EX

study of cutlass

cmake C++CUDA cutlass parallel-programming

Cuda 21

5 个月前

Bruce-Lee-LY / cutlass_gemm

#大语言模型#Multiple GEMM operators are constructed with cutlass to support LLM inference.

cublas cutlass 大语言模型 Nvidia gemm gpu

C++ 17

7 个月前

sgl-project / whl

Kernel Library Wheel for SGLang

CUDA cutlass

HTML 8

1 天前

qdLMF / LightGlue-with-FlashAttentionV2-TensorRT

A cutlass cute implementation of headdim-64 flashattentionv2 TensorRT plugin for LightGlue. Run on Jetson Orin NX 8GB with TensorRT 8.5.2.

cute cutlass tensorrt feature-matching CUDA flash-attention multihead-attention transformer superpoint

Cuda 4

1 个月前

cjmcv / ai-infra-notes

#大语言模型#Reading notes on the open source code of AI infrastructure (sglang, llm, cutlass, hpc, etc.)

CUDA cutlass hpc inference 大语言模型 mlsys simd gpu

7 天前

digital-nomad-cheng / tvm_project_course

编译器 CUDA cutlass 神经网络 tensorrt tvm

Python 1

1 年前

Routhleck / blocksparse-pytorch-implement

pytorch implements block sparse

CUDA cutlass matrix-multiplication Python PyTorch

C++ 1

2 年前