mllm · GitHub Topics

#自然语言处理#Unilm是一个跨任务、语言和模式的大规模自监督预训练模型

自然语言处理 pre-trained-model unilm minilm layoutlm layoutxlm beit document-ai trocr beit-3 foundation-models 大语言模型 multimodal mllm

Python 21.05 k

1 个月前

X-PLUG / MobileAgent

#安卓#Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

agent gpt4v mllm mobile-agents multimodal multimodal-large-language-models multimodal-agent Android App GUI 移动自动化 copilot harmony iOS

Python 4.05 k

3 天前

NExT-GPT / NExT-GPT

#大语言模型#Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

ChatGPT foundation-models gpt-4 instruction-tuning large-language-models 大语言模型 multi-modal-chatgpt multimodal visual-language-learning mllm

Python 3.48 k

5 个月前

ant-research / MagicQuill

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

aigc image-editing mllm gradio

Python 3.27 k

1 个月前

manycore-research / SpatialLM

SpatialLM: Large Language Model for Spatial Understanding

mllm

Python 2.96 k

16 天前

atfortes / Awesome-LLM-Reasoning

#大语言模型#Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

language-models reasoning prompt in-context-learning ChatGPT chain-of-thought prompt-engineering cot Awesome Lists gpt mllm multimodal papers gpt-4o openai-o1 strawberry deepseek deepseek-r1

2.96 k

25 天前

InternLM / InternLM-XComposer

#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

ChatGPT visual-language-learning multi-modality foundation gpt-4 instruction-tuning mllm multimodal vision-language-model language-model large-language-model large-vision-language-model 大语言模型 vision-transformer gpt

Python 2.81 k

3 个月前

simular-ai / Agent-S

Agent S: an open agentic framework that uses computers like a human

agent-computer-interface ai-agents computer-automation gui-agents memory mllm planning retrieval-augmented-generation in-context-reinforcement-learning computer-use grounding

Python 2.19 k

21 小时前

X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

chart-understanding document-understanding mllm multimodal multimodal-large-language-models table-understanding

Python 2.15 k

4 个月前

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

聊天机器人 clip 机器视觉 dino instruction-tuning large-language-models llms mllm multimodal-large-language-models representation-learning

Python 1.89 k

5 个月前

SkyworkAI / Skywork-R1V

#大语言模型#Pioneering Multimodal Reasoning with CoT

deepseek-r1 大语言模型 mllm

Python 1.82 k

4 天前

coderonion / awesome-yolo-object-detection

#数据仓库#🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

yolo yolov5 tensorrt object-detection yolov8 CUDA 大语言模型 llama vlm 数据集 deepseek GUI mllm qwen

1.45 k

3 天前

magic-research / Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

机器视觉 mllm

Python 1.02 k

5 天前

BAAI-DCAI / Bunny

#大语言模型#A family of lightweight multimodal models.

mllm ChatGPT gpt-4 multimodal-large-language-models vlm 中文 english

Python 1.01 k

5 个月前

CircleRadon / Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

mllm sam visual-instruction-tuning pixel-understanding

Python 816

1 个月前

NVlabs / EAGLE

#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Demo gpt4 huggingface llama llama3 llava lmm mllm 大语言模型 large-language-models

Python 654

4 天前

coderonion / awesome-llm-and-aigc

#数据仓库#🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applica...

ChatGPT gpt large-language-models 大语言模型 Awesome Lists llama aigc langchain hugging-face 数据集 yolo triton CUDA vlm deepseek qwen mllm ai4science

654

3 天前

VITA-MLLM / Woodpecker

#大语言模型#✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

hallucination hallucinations large-language-models 大语言模型 mllm multimodal-large-language-models multimodality

Python 635

4 个月前

taco-group / OpenEMMA

#算法刷题#OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

算法人工智能 autonomous-driving autonomous-vehicles autonomy generative-ai 机器学习 mllm Network perception

Python 613

7 小时前

FoundationVision / Groma

#大语言模型#[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

grounding 大语言模型 mllm large-language-models foundation-models llama llama2 multimodal vision-language-model

Python 556

10 个月前