vision-transformer · GitHub Topics

OpenMMLab Detection Toolbox and Benchmark

翻译 - OpenMMLab检测工具箱和基准

object-detection instance-segmentation fast-rcnn faster-rcnn mask-rcnn cascade-rcnn ssd retinanet PyTorch panoptic-segmentation rtmdet swin-transformer transformer vision-transformer yolo convnext detr grounding-dino

Python 30.77 k

8 个月前

lukas-blecher / LaTeX-OCR

#计算机科学#pix2tex: Using a ViT to convert images of equations into LaTeX code.

翻译 - pix2tex：使用 ViT 将方程图像转换为 LaTeX 代码。

机器学习 transformer im2latex 深度学习 image2text LaTeX dataset PyTorch im2markup OCR latex-ocr vit math-ocr vision-transformer 图像处理 Python im2text

Python 14.09 k

3 个月前

NielsRogge / Transformers-Tutorials

This repository contains demos I made with the Transformers library by HuggingFace.

翻译 - 这个存储库包含我用 HuggingFace 的 Transformers 库制作的演示。

transformers PyTorch bert vision-transformer layoutlm gpt-2

Jupyter Notebook 10.56 k

3 个月前

FoundationVision / VAR

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...

auto-regressive-model diffusion-models image-generation transformers autoregressive-models generative-ai generative-model gpt gpt-2 large-language-models vision-transformer neurips

Jupyter Notebook 7.44 k

22 天前

adithya-s-k / omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks

OCR omniparser parse-server parser-library vision-transformer web-crawler

Python 6.47 k

3 天前

cmhungsteve / Awesome-Transformer-Attention

#Awesome#An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

transformer attention-mechanism vision-transformer 深度学习 Awesome Lists transformer-cv transformer-architecture transformer-awesome transformer-with-cv transformer-models visual-transformer 机器视觉 papers attention-mechanisms self-attention vit detr transformers

4.83 k

8 个月前

JingyunLiang / SwinIR

SwinIR: Image Restoration Using Swin Transformer (official repository)

翻译 - SwinIR：使用 Swin Transformer 的图像恢复

image-super-resolution image-denoising compression-artifact-reduction image-deblocking transformer real-world-image-super-resolution lightweight-image-super-resolution image-restoration low-level-vision vision-transformer restoration super-resolution denoising decompression

Python 4.77 k

1 年前

huawei-noah / Efficient-AI-Backbones

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

翻译 - [CVPR2020]超越MobileNetV3：“ GhostNet：廉价运营带来的更多功能”

convolutional-neural-networks efficient-inference imagenet model-compression Tensorflow PyTorch ghostnet transformer pretrained-models vision-transformer

Python 4.18 k

1 个月前

open-mmlab / mmpretrain

#计算机科学#OpenMMLab Pre-training Toolbox and Benchmark

翻译 - OpenMMLab图像分类工具箱和基准

image-classification resnet mobilenet PyTorch 深度学习 swin-transformer beit clip constrastive-learning convnext masked-image-modeling moco pretrained-models self-supervised-learning vision-transformer multimodal

Python 3.62 k

5 个月前

google-research / scenic

#计算机科学#Scenic: A Jax Library for Computer Vision Research and Beyond

翻译 - Scenic：用于计算机视觉研究及其他领域的 Jax 库

jax 机器视觉深度学习 research attention transformers vision-transformer

Python 3.5 k

2 天前

towhee-io / towhee

#大语言模型#Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

机器学习 convolutional-networks embedding-vectors embeddings 机器视觉图像处理 video-processing feature-extraction image-retrieval unstructured-data feature-vector transformer milvus vision-transformer vit pipeline 大语言模型

Python 3.35 k

6 个月前

InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPT visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model large-language-model large-vision-language-model 大语言模型 vision-transformer gpt

Python 2.81 k

3 个月前

mit-han-lab / efficientvit

Efficient vision foundation models for high-resolution generation and perception.

high-resolution imagenet efficientvit segment-anything segmentation vision-transformer deep-compression-autoencoder efficient-diffusion-model

Python 2.8 k

9 天前

baaivision / EVA

EVA Series: Visual Representation Fantasies from BAAI

foundation-models representation-learning vision-transformer

Python 2.47 k

8 个月前

hila-chefer / Transformer-Explainability

#计算机科学#[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

深度学习 vision-transformer bert-model bert explainability vit cvpr2021

Jupyter Notebook 1.87 k

1 年前

alibaba / EasyCV

An all-in-one toolkit for computer vision

self-supervised-learning transformers classification 机器视觉 object-detection PyTorch vision-transformer

Python 1.85 k

9 个月前

OpenGVLab / InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

foundation-models video-understanding vision-transformer action-recognition multimodal temporal-action-localization video-question-answering zero-shot-classification benchmark contrastive-learning self-supervised instruction-tuning video-clip

Python 1.8 k

4 天前

microsoft / Cream

This is a collection of our NAS and Vision Transformer work.

翻译 - [NeurIPS'20]作物的精华：为一击式神经结构搜索提炼优先路径

nas automl vision-transformer rpe vit-compression efficiency knowledge-distillation

Python 1.74 k

9 个月前

ViTAE-Transformer / ViTPose

#计算机科学#The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"

深度学习 distillation pose-estimation PyTorch self-supervised-learning vision-transformer

Python 1.57 k

9 个月前

MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

self-supervised-learning action-recognition video-understanding transformer vision-transformer PyTorch video-analysis neurips-2022

Python 1.47 k

1 年前