lmm · GitHub Topics

#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...

ai-agent ai-agents-framework computer-control cradle gcc generative-ai grounding large-language-models 大语言模型 lmm multimodality vision-language-model vlm 人工智能

Python 2.07 k

5 个月前

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

foundation-models lmm vision-and-language vision-language-model llm-agent

Python 863

5 个月前

NVlabs / EAGLE

#大语言模型#Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Demo gpt4 huggingface llama llama3 llava lmm mllm 大语言模型 large-language-models

Python 654

4 天前

LLaVA-VL / LLaVA-Interactive-Demo

LLaVA-Interactive-Demo

lmm multimodal

Python 368

9 个月前

tianyi-lab / HallusionBench

#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark gpt-4 gpt-4v llava benchmarks hallucination 大语言模型 lmm large-language-models large-vision-language-models

Python 280

5 个月前

mbzuai-oryx / Video-LLaVA

#大语言模型#PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

大语言模型 lmm Video grounding transcription

Python 257

1 年前

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

connector lmm mllm

Python 245

4 个月前

TIGER-AI-Lab / Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]

language vision lmm mllm Video vlm multimodal

Python 211

20 天前

TideDra / VL-RLHF

#大语言模型#A RLHF Infrastructure for Vision-Language Models

dpo 大语言模型 lmm mllm rlhf vlm

Python 171

5 个月前

xieyuquanxx / awesome-Large-MultiModal-Hallucination

😎 curated list of awesome LMM hallucinations papers, methods & resources.

hallucination multi-modal lmm multimodal

150

1 年前

Q-Future / A-Bench

[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?

evaluation lmm

143

2 个月前

Javis603 / Discord-AIBot

#大语言模型#🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手，整合多种顶级 AI 模型，支持...