#自然语言处理#Unilm是一个跨任务、语言和模式的大规模自监督预训练模型
#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
#大语言模型#Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
#大语言模型#A family of lightweight multimodal models.
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
[ICLR 2025] Agent S: an open agentic framework that uses computers like a human
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
#算法刷题#OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
#大语言模型#Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train...
#大语言模型#This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral