large-multimodal-models · GitHub Topics

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

large-multimodal-models multimodal-large-language-models

Python 2.22 k

16 天前

OpenAdaptAI / OpenAdapt

Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Python transformers large-language-models large-multimodal-models huggingface segment-anything large-action-model agents ai-agents ai-agents-framework anthropic google-gemini openai ultralytics computer-use gpt4o omniparser

Python 1.23 k

1 个月前

ShareGPT4Omni / ShareGPT4Video

#大语言模型#[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

ChatGPT gpt gpt-4v large-language-models large-multimodal-models large-vision-language-models sora text-to-video

Python 1.05 k

6 个月前

TinyLLaVA / TinyLLaVA_Factory

#自然语言处理#A Framework of Small-scale Large Multimodal Models

large-multimodal-models llama llava 自然语言处理 transformers vision-language

Python 790

18 天前

LLaVA-VL / LLaVA-Plus-Codebase

LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills

agent large-language-models large-multimodal-models multimodal-large-language-models tool-use

Python 737

1 年前

richard-peng-xia / awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Medical imaging multimodal-deep-learning multimodal-learning visual-question-answering large-language-models large-multimodal-models multimodal-large-language-models

715

11 天前

ictnlp / LLaVA-Mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

efficient gpt4o gpt4v large-language-models large-multimodal-models llava multimodal Video vision vision-language-model visual-instruction-tuning llama multimodal-large-language-models

Python 443

3 个月前

MMMU-Benchmark / MMMU

#自然语言处理#This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

机器视觉深度学习深度神经网络 evaluation foundation-models large-language-models large-multimodal-models 大语言模型 llms 机器学习 multimodal multimodal-deep-learning multimodal-learning multimodality 自然语言处理 question-answering STEM visual-question-answering

Python 412

1 个月前

xiaoachen98 / Open-LLaVA-NeXT

#大语言模型#An open-source implementation for training LLaVA-NeXT.

聊天机器人 ChatGPT gpt-4 gpt4o large-multimodal-models llama llama3 llava multi-modality multimodal vision-language-model visual-language-learning

Python 390

6 个月前

shikiw / OPERA

#大语言模型#[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

large-multimodal-models llama multimodal vision-language-model 聊天机器人 ChatGPT gpt-4

Python 329

8 个月前

thunlp / LEGENT

Open Platform for Embodied Agents

embodied-ai language-grounding large-multimodal-models physics-engine

Python 306

3 个月前

zjysteven / lmms-finetune

A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

finetuning foundation-models instruction-tuning large-language-model large-multimodal-models multimodal multimodal-large-language-models vision-language visual-instruction-tuning llava

Python 284

2 个月前