cuda-kernels · GitHub Topics

NVIDIA / cuda-samples

CUDA 开发人员使用的示例，演示了 CUDA 工具包中的功能

CUDA cuda-kernels cuda-opengl cuda-driver-api

C 7.29 k

1 个月前

InternLM / lmdeploy

#大语言模型#LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

cuda-kernels deepspeed fastertransformer llm-inference turbomind internlm llama 大语言模型 codellama llama2 llama3

Python 6.07 k

2 天前

Rust-GPU / Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.

CUDA cuda-kernels cuda-programming gpgpu gpu gpu-programming Rust

Rust 4.19 k

1 天前

xlite-dev / CUDA-Learn-Notes

📚Modern CUDA Learn Notes: 200+ Tensor/CUDA Cores Kernels🎉, HGEMM, FA2 via MMA and CuTe, 98~100% TFLOPS of cuBLAS/FA2.

CUDA gemm cuda-kernels cuda-programming cudnn cutlass flash-attention

Cuda 3.39 k

14 小时前

coreylowman / dfdx

#计算机科学#Deep learning in Rust, with shape checked tensors and neural networks

Rust autograd autodiff 机器学习神经网络 backpropagation tensor 深度学习深度神经网络 CUDA cuda-kernels gpu gpu-acceleration gpu-computing cudnn

Rust 1.8 k

9 个月前

NVIDIA / cccl

CUDA Core Compute Libraries

accelerated-computing C++cpp-programming CUDA cuda-cpp cuda-kernels cuda-library cuda-programming gpu gpu-acceleration gpu-computing gpu-programming hpc Nvidia parallel-algorithm parallel-computing parallel-programming modern-cpp

C++ 1.59 k

2 小时前

coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

CUDA cuda-programming gpu gpu-acceleration Rust cublas curand cuda-kernels cudnn nccl

Rust 804

1 天前

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

benchmark cuda-kernels CUDA performance Nvidia gpu

Cuda 616

1 天前

harrism / hemi

Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.

CUDA gpu cuda-kernels C++

C++ 346

3 年前

KernelTuner / kernel_tuner

#计算机科学#Kernel Tuner

cuda-kernels Python gpu CUDA opencl C C++auto-tuning gpu-computing Testing 软件工程 optimization 机器学习

Python 326

4 天前

jaredhoberock / stanford-cs193g-sp2010

This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010

CUDA cuda-kernels cuda-programming gpu-programming

C++ 217

3 年前

HMUNACHI / henry-vjp

#计算机科学#From zero to hero CUDA for accelerating maths and machine learning on GPU.

CUDA cuda-kernels cuda-programming 机器学习 maths

Cuda 183

18 天前

deepakkumar1984 / Amplifier.NET

Amplifier allows .NET developers to easily run complex applications with intensive mathematical computation on Intel CPU/GPU, NVIDIA, AMD without writing any additional C kernel code. Write your funct...

opencl cuda-kernels 编译器 gpgpu gpgpu-computing simd

C# 178

4 天前

PatWie / cuda-design-patterns

Some CUDA design patterns and a bit of template magic for CUDA

CUDA C++template-metaprogramming cuda-kernels gpu bazel

C++ 150

2 年前

tudelft / cuSNN

Spiking Neural Networks in C++ with strong GPU acceleration through CUDA

CUDA cuda-kernels 神经网络

Cuda 127

5 年前

eyalroz / cuda-kat

#算法刷题#CUDA kernel author's tools

CUDA cuda-kernels utility-library C++constexpr 算法 patterns modern-cpp gpu-programming gpu cuda-library cuda-programming printf

Cuda 111

3 年前

microsoft / Accera

Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research

cross-platform python-library research 机器学习 gpu-acceleration tuning-parameters cross-compiler 编译器 cuda-kernels

C++ 111

2 年前

alexzhang13 / flashattention2-custom-mask

#计算机科学#Triton implementation of FlashAttention2 that adds Custom Masks.

attention attention-mechanism cuda-kernels 深度学习 flash-attention triton

Python 108

8 个月前

wangsiping97 / FastGEMV

#计算机科学#High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

CUDA cuda-kernels 机器学习 optimization

Cuda 103

9 个月前

yalue / cuda_scheduling_examiner_mirror

A tool for examining GPU scheduling behavior.

benchmark CUDA cuda-kernels gpu mandelbrot

Cuda 79

8 个月前