This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Reading list for research topics in multimodal machine learning
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
#计算机科学#Meta-Transformer for Unified Multimodal Learning
NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
翻译 - NeuralTalk是一个Python + numpy项目,用于学习使用语句描述图像的多模态递归神经网络。
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
A Survey on multimodal learning research.
The code for the paper "Multimodal Task-driven Dictionary Learning for Image Classification".
Research Trends in LLM-guided Multimodal Learning.
Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
BLOCK (AAAI 2019), with a multimodal fusion library for deep learning models
Multimodal-GPT
Pan-Cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning - Cancer Cell
GLoRIA: A Multimodal Global-Local Representation Learning Framework forLabel-efficient Medical Image Recognition
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)
#大语言模型#AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Multimodal Unsupervised Image-to-Image Translation
#计算机科学#Jina 是一个基于深度学习的搜索框架,支持各种类型如图片,视频,长文本,PDF等。
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
[ACL'19] [PyTorch] Multimodal Transformer