triton · GitHub Topics

Efficient Triton Kernels for LLM Training

llm-training triton finetuning gemma2 llama llama3 llms mistral phi3

Python 4.84 k

11 小时前

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

CUDA PyTorch triton transformer

Jupyter Notebook 1.56 k

1 年前

thu-ml / SageAttention

#大语言模型#Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

attention 大语言模型 quantization CUDA triton video-generation mlsys

Cuda 1.29 k

19 小时前

TritonDataCenter / containerpilot

A service for autodiscovery and configuration of applications running in containers

翻译 - 自动发现和配置在容器中运行的应用程序的服务

containerpilot consul orchestration containers Docker joyent service-discovery triton

Go 1.13 k

2 年前

JonathanSalwan / Tigress_protection

Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.

deobfuscation triton symbolic-execution LLVM 逆向工程 taint-analysis

LLVM 826

1 年前

coderonion / awesome-llm-and-aigc

#数据仓库#🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...

ChatGPT gpt large-language-models 大语言模型 Awesome Lists llama aigc langchain hugging-face 数据集 yolo triton CUDA vlm deepseek qwen mllm ai4science

654

4 天前

BobMcDear / attorch

#计算机科学#A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

CUDA 深度学习机器学习 PyTorch triton openai openai-triton

Python 529

2 个月前

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

PyTorch triton

Python 482

2 天前

JafarAkhondali / acer-predator-turbo-and-rgb-keyboard-linux-module

Linux kernel module to support Turbo mode and RGB Keyboard for Acer Predator notebook series

turbo acer helios triton led Linux rgb Hacktoberfest

C 439

4 个月前

rkinas / triton-resources

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

CUDA triton

Python 332

1 个月前

d4em0n / exrop

Automatic ROPChain Generation

翻译 - 自动ROPChain生成

rop exploitdev ctf binary-exploitation 逆向工程 exploit-development pwn triton rop-gadgets rop-exploitation symbolic-execution

Python 284

5 年前

coderonion / awesome-cuda-and-hpc

#大语言模型#🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

CUDA cublas tensorrt Awesome Lists 大语言模型 gpu blas PyTorch hpc gemm llama cudnn triton tensorrt-llm cutlass mlir tvm deepseek ptx

240

4 天前

Colton1skees / Dna

LLVM based static binary analysis framework

analysis binary deobfuscation instruction-semantics lifter program-analysis triton LLVM llvm-ir static-analysis x86 x86-64

C++ 233

11 天前

opendilab / DI-hpc

OpenDILab RL HPC OP Lib, including CUDA and Triton kernel

reinforcement-learning CUDA hpc lstm PyTorch triton

Python 226

9 个月前

SQLab / symgdb

SymGDB - symbolic execution plugin for gdb

gdb gdb-plugin symbolic-execution triton

Python 216

7 年前

kakaobrain / trident

#计算机科学#A performance library for machine learning applications.

人工智能 Library performance triton 深度学习 Python PyTorch 机器学习

Python 183

2 年前

mmsaeed509 / bspwm-dots

Ozoz dotfiles for bspwm, i3WM

Arch Linux bspwm polybar rofi dotfiles i3wm Linux acer helios Neovim triton turbo neofetch

Shell 166

9 个月前

clearml / clearml-serving

#计算机科学#ClearML - Model-Serving Orchestration and Repository Solution

机器学习 mlops DevOps 深度学习 Kubernetes 人工智能 model-serving serving triton triton-inference-server

Python 148

3 个月前

NVIDIA-ISAAC-ROS / isaac_ros_object_detection

#计算机科学#NVIDIA-accelerated, deep learned model support for image space object detection

ros2 object-detection inference 深度学习 Nvidia triton 机器学习 tensorrt ros2-humble ros gpu jetson

C++ 147

1 个月前

novioleo / Savior

(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of...

workflow 深度学习部署 triton rpa distributed

Python 137

4 年前