OpenMMLab Detection Toolbox and Benchmark
翻译 - OpenMMLab检测工具箱和基准
#计算机科学#pix2tex: Using a ViT to convert images of equations into LaTeX code.
翻译 - pix2tex:使用 ViT 将方程图像转换为 LaTeX 代码。
This repository contains demos I made with the Transformers library by HuggingFace.
翻译 - 这个存储库包含我用 HuggingFace 的 Transformers 库制作的演示。
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultr...
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
#Awesome#An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
SwinIR: Image Restoration Using Swin Transformer (official repository)
翻译 - SwinIR:使用 Swin Transformer 的图像恢复
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
翻译 - [CVPR2020]超越MobileNetV3:“ GhostNet:廉价运营带来的更多功能”
#计算机科学#OpenMMLab Pre-training Toolbox and Benchmark
翻译 - OpenMMLab图像分类工具箱和基准
#计算机科学#Scenic: A Jax Library for Computer Vision Research and Beyond
翻译 - Scenic:用于计算机视觉研究及其他领域的 Jax 库
#大语言模型#Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Efficient vision foundation models for high-resolution generation and perception.
EVA Series: Visual Representation Fantasies from BAAI
#计算机科学#[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
An all-in-one toolkit for computer vision
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
This is a collection of our NAS and Vision Transformer work.
翻译 - [NeurIPS'20]作物的精华:为一击式神经结构搜索提炼优先路径
#计算机科学#The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training