Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
Emu Series: Generative Multimodal Models from BAAI
Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train Dataset for table understanding and develop a generalist tabula...
Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography
Path to Multimodal Generalist: Level, Benchmark and Model
Multimodal large language model for generalist Minecraft agent.
Multimodal-GPT
a generalist algorithm for cellular segmentation with human-in-the-loop capabilities
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
#大语言模型#AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Multimodal Unsupervised Image-to-Image Translation
#计算机科学#Jina 是一个基于深度学习的搜索框架,支持各种类型如图片,视频,长文本,PDF等。
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
[ACL'19] [PyTorch] Multimodal Transformer
Multimodal Sarcasm Detection Dataset
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"
Toward Multimodal Image-to-Image Translation
#计算机科学#Meta-Transformer for Unified Multimodal Learning
#计算机科学#Represent, send, store and search multimodal data
翻译 - 非结构化数据的数据结构
A curated list of Multimodal Related Research.
翻译 - 精选的多模式相关研究清单。
Reading list for research topics in multimodal machine learning