#计算机科学#A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
翻译 - 来自Facebook AI Research(FAIR)的视觉和语言多模式研究的模块化框架
#大语言模型#InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
#自然语言处理#北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答(KBQA),基于文本的问答系统(TextQA),基于表格的问答系统(TableQA)、基于视觉的问答系统(VisualQA)和机器阅读理解(MRC)等,每类任务分别对学术界和工业界进行了相关总结。
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-bas...
An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
#Awesome#A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
#大语言模型#Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)
PyTorch implementation for the Neuro-Symbolic Concept Learner (NS-CL).
#计算机科学#PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
#计算机科学#A lightweight, scalable, and general framework for visual question answering research
#自然语言处理#Multimodal Question Answering in the Medical Domain: A summary of Existing Datasets and Systems