#

vision-and-language

https://static.github-zh.com/github_avatars/aishwaryanr?size=40

#面试#A one stop repository for generative AI research updates, interview resources, notebooks and much more!

19.29 k
14 天前
https://static.github-zh.com/github_avatars/roboflow?size=40

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2.64 k
8 小时前
https://static.github-zh.com/github_avatars/salesforce?size=40

Code for ALBEF: a new vision-language pre-training method

Python 1.72 k
3 年前
https://static.github-zh.com/github_avatars/dandelin?size=40

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Python 1.5 k
2 年前
https://static.github-zh.com/github_avatars/om-ai-lab?size=40
Python 1.34 k
10 个月前
https://static.github-zh.com/github_avatars/NVlabs?size=40
Python 1.31 k
2 年前
https://static.github-zh.com/github_avatars/yuewang-cuhk?size=40
1.16 k
3 年前
https://static.github-zh.com/github_avatars/rhymes-ai?size=40

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 1.07 k
9 个月前
https://static.github-zh.com/github_avatars/OFA-Sys?size=40

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 1.05 k
1 年前
https://static.github-zh.com/github_avatars/YehLi?size=40

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

Python 968
3 年前
https://static.github-zh.com/github_avatars/mbzuai-oryx?size=40

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 919
2 个月前
https://static.github-zh.com/github_avatars/InternRobotics?size=40

[ECCV 2024 Best Paper Candidate & TPAMI 2025] PointLLM: Empowering Large Language Models to Understand Point Clouds

Python 903
2 个月前
https://static.github-zh.com/github_avatars/SunzeY?size=40
Jupyter Notebook 844
3 个月前
loading...
Website
Wikipedia