📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
Official inference framework for 1-bit LLMs
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
#自然语言处理#Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
a lightweight LLM model inference framework
TinyChatEngine: On-Device LLM Inference Library
Large Language Model (LLM) Inference API and Chatbot
#大语言模型#33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU
💬 Chatbot web app + HTTP and Websocket endpoints for LLM inference with the Petals client
#大语言模型#FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
LLaMA模型的推理代码
Universal and Transferable Attacks on Aligned Language Models
the LLM vulnerability scanner
#大语言模型#LLM Finetuning with peft
#大语言模型#[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
LLM as a Chatbot Service
#大语言模型#Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such ...