A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2
翻译 - 适用于x86(IA-32)/ x64(AMD64,x86-64)MMX / SSE / SSE2 / SSE3 / SSSE3 / SSE4 / FPU / AVX / AVX2 / AVX-512的JIT汇编器
Test the non-AVX, AVX2 and AVX-512 speeds across various active core counts
Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
An AVX-512 accelerated implementation of the BLAKE3 cryptographic hash function
Code for paper "Base64 encoding and decoding at almost the speed of a memory copy"
Intel AVX-512简介
TensorFlow binaries supporting AVX, FMA, SSE
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging NEON, AVX2, AVX-512, and SWAR to accelerate search, sort, edit distances, alignment scores, etc 🦖
TensorFlow binaries supporting AVX, FMA, SSE
Website for the 512 KB Club
翻译 - 512 KB俱乐部的网站
Invaders game in 512 bytes (boot sector)
Example code for Intel AVX / AVX2 intrinsics.
An AVX Lifter for the Hex-Rays Decompiler
AVX-optimized sin(), cos(), exp() and log() functions
Fundamental C++ SIMD types for Intel CPUs (sse, avx, avx2, avx512)