#大语言模型#AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
A Multimodal Native Agent Framework for Smart Devices and More
VisualWebArena is a benchmark for multimodal agents.
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
#大语言模型#TEN Agent is a world-class multimodal AI agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG.
LLM, MultiModal, and Agent tools for ComfyUI
Multimodal-GPT
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
Multimodal Unsupervised Image-to-Image Translation
#计算机科学#Jina 是一个基于深度学习的搜索框架,支持各种类型如图片,视频,长文本,PDF等。
TorchMultimodal is a PyTorch library for training state-of-the-art multimodal multi-task models at scale.
[ACL'19] [PyTorch] Multimodal Transformer
Multimodal Sarcasm Detection Dataset
Toward Multimodal Image-to-Image Translation
#计算机科学#Meta-Transformer for Unified Multimodal Learning
#计算机科学#Represent, send, store and search multimodal data
翻译 - 非结构化数据的数据结构
Emu Series: Generative Multimodal Models from BAAI
A curated list of Multimodal Related Research.
翻译 - 精选的多模式相关研究清单。
Reading list for research topics in multimodal machine learning
yubikey-agent is a seamless ssh-agent for YubiKeys.
翻译 - yubikey-agent是YubiKeys的无缝ssh-agent。
A framework to enable multimodal models to operate a computer.
#计算机科学#An open-source framework for training large multimodal models.
A Survey on multimodal learning research.