trainium · GitHub Topics

#大语言模型#A high-throughput and memory-efficient inference and serving engine for LLMs

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python 44.52 k

7 小时前

aws-samples / foundation-model-benchmarking-tool

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

benchmarking foundation-models inferentia llama2 sagemaker generative-ai benchmark bedrock llama3 trainium evaluation-metrics deepseek deepseek-r1

Jupyter Notebook 237

1 天前

HelpingAI / inferno

A production-ready inference server supporting any AI model on all major hardware platforms (CPU, GPU, TPU, Apple Silicon). Inferno seamlessly deploys and serves language models from Hugging Face, loc...

llm-serving model-serving PyTorch trainium

Python 3

3 天前