Learning how to write "Less Slow" code in C++ 20, C 99, CUDA, PTX, & Assembly, from numerics & SIMD to coroutines, ranges, exception handling, networking and user-space IO
#大语言模型#🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.
Free software file format parser for Avid ProTools sessions
A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
Energinets Model Testbench. Automate gridcompliance studies in PSCAD and Powerfactory.
Set of examples written for hardware acceleration via TornadoVM
Inline PTX Assembly in CUDA example
Bloch's equations and Optimal Control for MRI and NMR applications
Visual Studio Code extension with PTX assembly syntax support
FastPtx: a python pTx pulse design tool for freely optimizing RF and gradient pulses with autodifferentiation
公共運輸整合資訊流通服務平臺(Public Transport Data eXchange,PTX)的非官方 Golang 用戶端程式庫
🎉持续更新:CUDA 12.2 PTX-ISA-8.2学习笔记,部分中文翻译 + 个人理解 + 内联汇编示例,讲解CUDA 12.2 PTX-ISA-8.2 汇编指令;进行中.....