A curated list of awesome vision and language resources (still under construction... stay tuned!)
Multi Task Vision and Language
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Recent Advances in Vision and Language Pre-training (VLP)
Pretrain Vision and Large Language Models in Python, Published by Packt
deep learning, image retrieval, vision and language
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Strong and Open Vision Language Assistant for Mobile Devices
Bridging Vision and Language Model
Collection of AWESOME vision-language models for vision tasks
A curated list of prompt-based paper in computer vision and vision-language learning.
A curated list for vision-and-language navigation. ACL 2022 paper "Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions"
Vision-Language Pre-training for Image Captioning and Question Answering
#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence
GIT: A Generative Image-to-text Transformer for Vision and Language
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]
Mixture-of-Experts for Large Vision-Language Models
Search over large image datasets with natural language and computer vision!
DeepSeek-VL: Towards Real-World Vision-Language Understanding