✨✨Latest Advances on Multimodal Large Language Models
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Mixture-of-Experts for Large Vision-Language Models