#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence
Compose multimodal datasets 🎹
This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information...
#计算机科学#Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.
[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.
500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型(TensorFlow2.0)。
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl....
Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.
#Awesome#[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Collects a multimodal dataset of Wikipedia articles and their images
Millions-Level Face/Human-Scene Image-Text Datasets
Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
#计算机科学#Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation
#数据仓库#Towards Explainable Multimodal Depression Recognition for Clinical Interviews
Pre-Processing of Annotated Music Video Corpora (COGNIMUSE and DEAP)
Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Associati...
Image Recommendation for Wikipedia Articles
#计算机科学#Create a large, well-managed and clean data-set for the task of music composition for video soundtracks.
#计算机科学#All experiments were done to classify multimodal data.