large-vision-language-models

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

instruction-tuning instruction-following large-vision-language-model visual-instruction-tuning multi-modality in-context-learning large-language-models large-vision-language-models multimodal-chain-of-thought multimodal-in-context-learning multimodal-large-language-models chain-of-thought

14.67 k

2 天前

ShareGPT4Omni / ShareGPT4Video

#大语言模型#[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

ChatGPT gpt gpt-4v large-language-models large-multimodal-models large-vision-language-models sora text-to-video

Python 1.05 k

6 个月前

NVlabs / DoRA

#计算机科学#[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

commonsense-reasoning 深度学习深度神经网络 instruction-tuning large-language-models large-vision-language-models lora parameter-efficient-fine-tuning parameter-efficient-tuning vision-and-language

Python 761

6 个月前

MME-Benchmarks / Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

large-language-models large-vision-language-models mme multimodal-large-language-models Video

509

16 天前

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

aigc large-language-models large-vision-language-models multimodal-generation multimodal-large-language-models multimodal-models multimodality text-to-3d text-to-audio text-to-image text-to-speech text-to-video 大语言模型 mllm

HTML 453

9 天前

Paranioar / Awesome_Matching_Pretraining_Transfering

#Awesome#The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...

cross-modal-retrieval 教程 Awesome Lists image-text-matching image-text-retrieval large-language-models large-vision-language-models multimodal-pretraining parameter-efficient-fine-tuning vision-and-language multimodal-large-language-models large-language-model text-to-image-generation text-to-image-synthesis text-to-video-generation

425

4 个月前

burglarhobbit / Awesome-Medical-Large-Language-Models

Curated papers on Large Language Models in Healthcare and Medical domain

large-language-models multimodal-large-language-models large-vision-language-models

295

3 个月前

tianyi-lab / HallusionBench

#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark gpt-4 gpt-4v llava benchmarks hallucination 大语言模型 lmm large-language-models large-vision-language-models

Python 280

5 个月前

ShareGPT4Omni / ShareGPT4V

#大语言模型#[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

ChatGPT gpt gpt-4v gpt4v instruction-tuning language-model large-language-models large-multimodal-models large-vision-language-models vision-language-model eccv2024

Python 210

9 个月前

khuangaf / Awesome-Chart-Understanding

#Awesome#A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models...

Awesome Lists chart-understanding large-vision-language-models

196

11 天前

MMStar-Benchmark / MMStar

#大语言模型#[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluation large-language-models large-multimodal-models large-vision-language-model large-vision-language-models 大语言模型 llms multimodal multimodal-learning multimodality visual-question-answering

Python 174

7 个月前

NishilBalar / Awesome-LVLM-Hallucination

#大语言模型#up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

hallucination large-vision-language-models multimodal-large-language-models large-language-models 大语言模型 mllm

114

6 天前

llmbev / talk2bev

Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)

autonomous-driving gpt-4 large-language-models large-vision-language-models occupancy-grid-map

Python 109

5 个月前

yu-rp / apiprompting

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

large-multimodal-models large-vision-language-model large-vision-language-models prompting vision-language-model visual-prompting

Python 82

6 个月前

yfzhang114 / LLaVA-Align

This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.

hallucination large-vision-language-models

Python 77

2 个月前

mbzuai-oryx / GeoPixel

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabi...

foundation-models large-multimodal-models large-vision-language-models remote-sensing segmentation-models

Python 72

14 天前