Low-precision matrix multiplication
翻译 - 低精度矩阵乘法
llama3 implementation one matrix multiplication at a time
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
翻译 - FB(Facebook)+ GEMM(通用矩阵-矩阵乘法)-https://code.fb.com/ml-applications/fbgemm/
matrix multiplication in CUDA
Some scripts in Python, Java and C++ for matrix multiplication.
Efficient GPU kernels for block-sparse matrix multiplication and convolution
Fast CUDA matrix multiplication from scratch
Butterfly matrix multiplication in PyTorch
Benchmarking matrix multiplication implementations
Code appendix to an OpenCL matrix-multiplication tutorial
Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
Python wrapper for Intel Math Kernel Library (MKL) matrix multiplication
Convolution and Transposed Convolution in a Matrix Multiplication View
fast matrix multiplication (快速矩阵乘法)
Python package to accelerate the sparse matrix multiplication and top-n similarity selection
💥 Fast matrix-multiplication as a self-contained Python library – no system dependencies!
General matrix multiplication of f32 and f64 matrices in Rust. Supports matrices with general strides.