μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sorting of large arrays. Includes both CPU and GPU versions, along ...
A C++ header-only library for parallel linear algebra on GPUs (CUDA/cuBLAS under the hood)
learning to develop lightning fast C++/CUDA neural network
A beginner's guide to CUDA programming
This repo contains some CUDA C++ code examples that demonstrate how to use GPUs for parallel computing. Covering topics such as dynamic parallelization, Optimization, ....etc
Test the GPU performance on Linear Algebra Operations. Compare the results with CPP/Fortran