kv-cache · GitHub Topics

HDT3213 / godis

Go 语言实现的 Redis 服务器和分布式集群

kv-cache Go redis-server Redis godis cluster redis-cluster

Go 3.64 k

14 天前

Zefan-Cai / KVCache-Factory

#大语言模型#Unified KV Cache Compression Methods for Auto-Regressive Models

kv-cache 大语言模型

Python 990

3 个月前

harleyszhang / llm_note

#大语言模型#LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.

大语言模型 llm-inference vllm cuda-programming kv-cache transformer-models

Python 706

1 天前

therealoliver / Deepdive-llama3-from-scratch

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

inference kv-cache llama llms attention attention-mechanism gpt language-model mask Parsing transformer

Jupyter Notebook 566

2 个月前

NVIDIA / kvpress

#大语言模型#LLM KV cache compression made easy

大语言模型 inference kv-cache long-context Python PyTorch transformers large-language-models

Python 453

25 天前

FMInference / H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

gpt-3 high-throughput kv-cache large-language-models sparsity

Python 436

8 个月前

Zefan-Cai / Awesome-LLM-KV-Cache

#大语言模型#Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

kv-cache 大语言模型

269

1 个月前

itsnamgyu / block-transformer

#大语言模型#Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)

kv-cache 大语言模型 llm-inference

Python 151

4 个月前

NVIDIA-Merlin / HierarchicalKV

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on hig...

CUDA gpu hashtable recommender-system key-value-store kv-cache

Cuda 142

15 天前

kddubey / cappr

Completion After Prompt Probability. Make your LLM make a choice

text-classification zero-shot huggingface prompt-engineering llamacpp probability llm-inference kv-cache

Python 76

5 个月前

aju22 / LLaMA2

#自然语言处理#This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT) variant. The implementation focuses on the model architecture ...

attention gpt kv-cache llama llama2 大语言模型自然语言处理 transformer

Python 64

2 年前

DRSY / EasyKV

#大语言模型#Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)

kv-cache 大语言模型

Python 61

1 年前

hkproj / pytorch-llama-notes

Notes about LLaMA 2 model

attention-is-all-you-need kv-cache llama2 study-notes

Python 59

2 年前

DongmingShenDS / Mistral_From_Scratch

Mistral and Mixtral (MoE) from scratch

kv-cache large-language-models mistral-7b mixtral-8x7b mixture-of-experts

Python 7

1 年前

mehdihosseinimoghadam / AVA-Mistral-7B

#自然语言处理#Fine-Tuned Mistral 7B Persian Large Language Model LLM / Persian Mistral 7B

ava 深度学习 kv-cache large-language-models 大语言模型 mistral mistral-7b 自然语言处理

Jupyter Notebook 6

1 个月前

reshalfahsi / image-captioning-mobilenet-llama3

#自然语言处理#Image Captioning With MobileNet-LLaMA 3

image-captioning llama3 mobilenetv3 PyTorch pytorch-lightning kv-cache cnn transformer 自然语言处理

Jupyter Notebook 5

10 个月前

s-chh / PyTorch-Scratch-LLM

#大语言模型#Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding (R...

大语言模型 mixture-of-experts moe kv-cache

Python 3

5 个月前