Taskflow 助您用现代 C++ 快速编写并行和异构任务程序
Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.
Sample codes for my CUDA programming book
#计算机科学#TinyChatEngine: On-Device LLM Inference Library
Thin, unified, C++-flavored wrappers for the CUDA APIs
Safe rust wrapper around CUDA toolkit
#计算机科学#Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.
#计算机科学#A self-learning tutorail for CUDA High Performance Programing.
A simple GPU hash table implemented in CUDA using lock free techniques
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
#计算机科学#From zero to hero CUDA for accelerating maths and machine learning on GPU.
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
An implementation of HIP that works on CPUs, across OSes.
Accelerated General (FP32) Matrix Multiplication from scratch in CUDA
#算法刷题#CUDA kernel author's tools