#大语言模型#Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
#自然语言处理#中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
#计算机科学#Faster Whisper transcription with CTranslate2
#计算机科学#[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant
#自然语言处理#An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Fast inference engine for Transformer models
#自然语言处理#Sparsity-aware deep learning inference runtime for CPUs
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
#大语言模型#Run Mixtral-8x7B models in Colab or consumer desktops
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Ari...
翻译 - 基于pytorch的模型压缩(1,量化:8/4 / 2bits(dorefa),三进制/二进制值(twn / bnn / xnornet); 2,修剪:常规,常规和组卷积通道修剪; 3,组卷积结构; 4,特征(A)的二进制值的分批归一化折叠)
PyTorch native quantization and sparsity for training and inference
PaddleSlim is an open-source library for deep model compression and architecture search.
翻译 - PaddleSlim是一个用于深度模型压缩和体系结构搜索的开源库。
#大语言模型#INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Efficient computing methods developed by Huawei Noah's Ark Lab
#大语言模型#Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization
#大语言模型#[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
#自然语言处理#[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
#大语言模型#FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
#计算机科学#Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)