✨✨Latest Advances on Multimodal Large Language Models
#大语言模型#[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
#计算机科学#[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
#大语言模型#🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
#Awesome#The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...
Curated papers on Large Language Models in Healthcare and Medical domain
#大语言模型#[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
#大语言模型#[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
#Awesome#A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models...
#大语言模型#[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)
#大语言模型#up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabi...
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
#大语言模型#An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
[NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models