#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
#大语言模型#A family of lightweight multimodal models.
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).