#自然语言处理#通义千问-7B(Qwen-7B) 是阿里云研发的通义千问大模型系列的70亿参数规模的模型
#自然语言处理#中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
#大语言模型#Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
FlashInfer: Kernel Library for LLM Serving
#大语言模型#MoBA: Mixture of Block Attention for Long-Context LLMs
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
[CVPR 2025] The official CLIP training codebase of Inf-CL: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss". A super memory-efficiency CLIP training scheme.
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
#计算机科学#Triton implementation of FlashAttention2 that adds Custom Masks.
#自然语言处理#Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
#大语言模型#Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
#大语言模型#Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
#自然语言处理#Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.
Python package for rematerialization-aware gradient checkpointing
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
Utilities for efficient fine-tuning, inference and evaluation of code generation models
An simple pytorch implementation of Flash MultiHead Attention
#计算机科学#🚀 Automated deployment stack for AMD MI300 GPUs with optimized ML/DL frameworks and HPC-ready configurations