#自然语言处理#Unilm是一个跨任务、语言和模式的大规模自监督预训练模型
#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
#大语言模型#Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
#大语言模型#Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Agent S: an open agentic framework that uses computers like a human
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
#大语言模型#A family of lightweight multimodal models.
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
#数据仓库#🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...
#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
#算法刷题#OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization