✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
#大语言模型#[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
#自然语言处理#A Framework of Small-scale Large Multimodal Models
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
A collection of resources on applications of multi-modal learning in medical imaging.
#自然语言处理#This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
#大语言模型#An open-source implementation for training LLaVA-NeXT.
#大语言模型#[CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Open Platform for Embodied Agents
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
The official evaluation suite and dynamic data release for MixEval.
#大语言模型#[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
#大语言模型#Embed arbitrary modalities (images, audio, documents, etc) into large language models.
#大语言模型#[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
#计算机科学#A curated list of awesome Multimodal studies.
The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
(NeurIPS 2024) Official PyTorch implementation of LOVA3