Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence
Collection of AWESOME vision-language models for vision tasks
Multi Task Vision and Language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
DeepSeek-VL: Towards Real-World Vision-Language Understanding
#计算机科学#A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
翻译 - 来自Facebook AI Research(FAIR)的视觉和语言多模式研究的模块化框架
Bridging Vision and Language Model
Mixture-of-Experts for Large Vision-Language Models
A curated list of prompt-based paper in computer vision and vision-language learning.
deep learning, image retrieval, vision and language
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
Code for ALBEF: a new vision-language pre-training method
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Recent Advances in Vision and Language Pre-training (VLP)
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"