TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
Reading list for research topics in multimodal machine learning
✨✨Latest Advances on Multimodal Large Language Models
#计算机科学#Jina 是一个基于深度学习的搜索框架,支持各种类型如图片,视频,长文本,PDF等。
A real time Multimodal Emotion Recognition web app for text, sound and video inputs
Multimodal-GPT
#大语言模型#AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
[ACL'19] [PyTorch] Multimodal Transformer
Multimodal Unsupervised Image-to-Image Translation
A curated list of Multimodal Related Research.
翻译 - 精选的多模式相关研究清单。
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Multimodal Sarcasm Detection Dataset
Toward Multimodal Image-to-Image Translation
#计算机科学#Meta-Transformer for Unified Multimodal Learning
#计算机科学#Represent, send, store and search multimodal data
翻译 - 非结构化数据的数据结构
Emu Series: Generative Multimodal Models from BAAI
A framework to enable multimodal models to operate a computer.
#计算机科学#An open-source framework for training large multimodal models.
A Survey on multimodal learning research.
#大语言模型#A family of lightweight multimodal models.
Research Trends in LLM-guided Multimodal Learning.