multimodal-datasets · GitHub Topics

#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习 deep-learning-library image-captioning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering multimodal-datasets multimodal-deep-learning

Jupyter Notebook 10.44 k

5 个月前

remyxai / VQASynth

Compose multimodal datasets 🎹

multimodal-datasets multimodal-deep-learning synthetic-dataset-generation

Python 337

7 天前

drmuskangarg / Multimodal-datasets

This repository is build in association with our position paper on "Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers". As a part of this release we share the information...

multimodal-datasets

283

3 年前

AnkurDeria / MFT

#计算机科学#Pytorch implementation of Multimodal Fusion Transformer for Remote Sensing Image Classification.

深度学习 multimodal-datasets multimodal-deep-learning remote-sensing transformer-models

Jupyter Notebook 207

1 年前

wisdomikezogwo / quilt1m

[NeurIPS 2023 Oral] Quilt-1M: One Million Image-Text Pairs for Histopathology.

clip-model histopathology multimodal-datasets vlm

Python 157

1 年前

yuanxiaosc / Multimodal-short-video-dataset-and-baseline-classification-model

500,000 multimodal short video data and baseline models. 50万条多模态短视频数据集和基线模型（TensorFlow2.0）。

multimodal-datasets classification-model Tensorflow

Jupyter Notebook 128

6 年前

marslanm / Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl....

cross-modal multimodal-datasets multimodal-deep-learning multimodal-pre-trained-model transformer-models vision-language-pretraining

1 年前

roboflow / rf100-vl

Code from the paper "Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models"

机器视觉 multimodal-datasets object-detection

Python 48

3 天前

piresramon / gpt-4-enem

Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.

人工智能 llm-inference llms multimodal-datasets

Python 46

4 个月前

Yuco-Z / Awesome-Multi-Modal-Dialog

#Awesome#[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics

Awesome Lists dialogue multimodal multimodal-deep-learning multimodal-datasets multimodal-learning

3 个月前

JunweiLiang / FVTA_MemexQA

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

visual-question-answering vision-and-language multimodal-deep-learning multimodal-datasets

Python 32

6 年前

OlehOnyshchak / pyWikiMM

Collects a multimodal dataset of Wikipedia articles and their images

wikipedia multimodal multimodality multimodal-datasets multimodal-learning 数据库 data-cleaning data-collection data-processing

Python 15

2 年前

ddw2AIGROUP2CQUPT / Large-Scale-Multimodal-Face-Datasets

Millions-Level Face/Human-Scene Image-Text Datasets

multimodal-datasets

3 个月前

lujiaying / MUG-Bench

Data and code of the Findings of EMNLP'23 paper MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields

multimodal-datasets multimodal-learning

Python 9

1 年前

deepmancer / vlm-toolbox

#计算机科学#Vision-Language Models Toolbox: Your all-in-one solution for multimodal research and experimentation

clip 深度学习 deep-learning-library multimodal-datasets multimodal-deep-learning multimodal-learning prompt-tuning vision-and-language vision-framework vision-language-transformer zero-shot-classification PyTorch transformers

Jupyter Notebook 8

2 个月前

NUSTM / EMDRC

#数据仓库#Towards Explainable Multimodal Depression Recognition for Clinical Interviews

mental-health dataset 数据集 affective-computing multimodal-datasets

3 个月前

gcunhase / AnnotatedMV-PreProcessing

Pre-Processing of Annotated Music Video Corpora (COGNIMUSE and DEAP)

multimodal-datasets

Python 5

4 年前

clp-research / language-models-multimodal-tasks

Official Git repository for "Hakimov, S., and Schlangen, D., (2023). Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks. Findings of the Associati...

language-model multimodal-datasets multimodal-learning

Python 3

2 年前

OlehOnyshchak / WikiImageRecommendation

Image Recommendation for Wikipedia Articles

wikipedia multimodal-learning multimodal-deep-learning multimodal-datasets text Image recommender-systems data-collection

Jupyter Notebook 3

4 年前