inferentia · GitHub Topics

#大语言模型#A high-throughput and memory-efficient inference and serving engine for LLMs

gpt 大语言模型 PyTorch llmops mlops model-serving transformer llm-serving inference llama amd rocm CUDA inferentia trainium tpu xpu hpu deepseek qwen

Python 44.53 k

37 分钟前

aphrodite-engine / aphrodite-engine

#计算机科学#Large-scale LLM inference engine

API inference-engine 机器学习 CUDA inferentia rocm intel lora speculative-decoding tpu

C++ 1.38 k

3 天前

aws-samples / foundation-model-benchmarking-tool

Foundation model benchmarking tool. Run any model on any AWS platform and benchmark for performance across instance type and serving stack options.

benchmarking foundation-models inferentia llama2 sagemaker generative-ai benchmark bedrock llama3 trainium evaluation-metrics deepseek deepseek-r1

Jupyter Notebook 237

2 天前

aws-solutions-library-samples / guidance-for-machine-learning-inference-on-aws

This Guidance demonstrates how to deploy a machine learning inference architecture on Amazon Elastic Kubernetes Service (Amazon EKS). It addresses the basic implementation requirements as well as ways...

inferentia 机器学习

Shell 42

2 个月前

aws-samples / aws-inferentia-huggingface-workshop

#自然语言处理#CMP314 Optimizing NLP models with Amazon EC2 Inf1 instances in Amazon Sagemaker

自然语言处理 sagemaker inferentia

Jupyter Notebook 13

1 年前

aws-samples / awsome-fmops

Collection of bet practices, reference architectures, examples, and utilities for foundation model development and deployment on AWS.

eks generative-ai gpu inferentia Kubernetes llm-inference Terraform llm-training PyTorch

HCL 10

4 个月前

daekeun-ml / aws-inferentia

This repository provides an easy hands-on way to get started with AWS Inferentia. A demonstration of this hands-on can be seen in the AWS Innovate 2023 - AIML Edition session.

inferentia

Jupyter Notebook 7

2 年前