rocm

#大语言模型#A high-throughput and memory-efficient inference and serving engine for LLMs

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python 45.13 k

1 小时前

apache / tvm

#计算机科学#Open deep learning compiler stack for cpu, gpu and specialized accelerators

翻译 - 针对cpu，gpu和专用加速器的开放式深度学习编译器堆栈

编译器 tensor 深度学习 gpu opencl metal performance JavaScript rocm tvm vulkan spirv 机器学习

Python 12.21 k

8 小时前

cupy / cupy

NumPy & SciPy for GPU

翻译 - CUDA加速了类似NumPy的API

CUDA cudnn cublas cusolver nccl Python NumPy cupy curand cusparse gpu SciPy tensor rocm

Python 10.12 k

14 小时前

dmlc / nnvm

#计算机科学#

翻译 - 移至https://github.com/dmlc/tvm/

computation-graph 深度学习 optimization 部署 nnvm tvm CUDA opencl rocm metal

C++ 1.66 k

7 年前

deepmodeling / deepmd-kit

#计算机科学#A deep learning package for many-body potential energy representation and molecular dynamics

翻译 - 用于多体势能表示和分子动力学的深度学习包

深度学习 Molecular Dynamics deepmd lammps potential-energy Python Tensorflow C++CUDA rocm computational-chemistry materials-science C Node.js PyTorch jax paddle

Python 1.63 k

12 小时前

aphrodite-engine / aphrodite-engine

#计算机科学#Large-scale LLM inference engine

API inference-engine 机器学习 CUDA inferentia rocm intel lora speculative-decoding tpu

C++ 1.38 k

20 小时前

stotko / stdgpu

stdgpu: Efficient STL-like Data Structures on the GPU

gpu gpu-computing gpu-acceleration gpgpu 数据结构 stl stl-like stl-containers C++modern-cpp CUDA openmp rocm hip

C++ 1.21 k

2 天前

ROCm / ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform

rocm Docker

Shell 459

2 个月前

alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration 🦙

CUDA hpc gpu rocm hip openmp heterogeneous-parallel-programming C++header-only tbb

C++ 373

21 天前

ROCm / rocBLAS

Next generation BLAS implementation for ROCm platform

blas rocm hip

C++ 362

3 天前

agenium-scale / nsimd

Agenium Scale vectorization library for CPUs and GPUs

simd simd-programming sse2 sse42 avx avx2 avx512 neon aarch64 simd-instructions CUDA rocm C++hpc simd-library

C 331

3 年前

QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance p...

quantum-monte-carlo C++high-performance-computing quantum-chemistry CUDA gpu hpc mpi rocm oneapi

C++ 329

3 天前

ROCm / k8s-device-plugin

Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster

Kubernetes rocm

Go 320

6 天前

JuliaGPU / AMDGPU.jl

AMD GPU (ROCm) programming in Julia

Julia 语言 rocm amdgpu gpu gpu-programming

Julia 299

13 天前

ROCm / aomp

AOMP is an open source Clang/LLVM based compiler with added support for the OpenMP® API on Radeon™ GPUs. Use this repository for releases, issues, documentation, packaging, and examples.

amd LLVM clang openmp rocm

Fortran 217

2 天前

LLNL / hiop

HPC solver for nonlinear optimization problems

hpc nonlinear-optimization nonlinear-programming parallel-programming mpi bfgs constrained-optimization solver optimization CUDA math-physics radiuss rocm

C++ 217

2 天前

eth-cscs / COSMA

Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm

matrix-multiplication mpi linear-algebra gpu-acceleration CUDA rocm

C++ 206

15 天前

supranational / sppark

Zero-knowledge template library

CUDA zero-knowledge 零知识证明 zk-snarks rocm

Cuda 195

1 个月前

ROCm / MIVisionX

#计算机科学#MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimize...

机器视觉 openvx 神经网络 opencl rocm inference inference-engine onnx 机器学习 ryzen 虚拟现实

C++ 193

6 天前

ROCm / rocFFT

Next generation FFT implementation for ROCm

fft rocm hip amd fast fourier gpu transform

C++ 190

3 天前

Website
Wikipedia