#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
#计算机科学#Instruction Following Agents with Multimodal Transforemrs
#计算机科学#code for studying OpenAI's CLIP explainability
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
VTC: Improving Video-Text Retrieval with User Comments
#大语言模型#A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.
#计算机科学#Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation
A list of research papers on knowledge-enhanced multimodal learning
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Streamlit App Combining Vision, Language, and Audio AI Models
#大语言模型#Coding a Multi-Modal vision model like GPT-4o from scratch, inspired by @hkproj and PaliGemma
Mini-batch selective sampling for knowledge adaption of VLMs for mammography.
#自然语言处理#VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)