#大语言模型#🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Mixture-of-Experts for Large Vision-Language Models
LLaVA-Interactive-Demo
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
A simple "Be My Eyes" web app with a llama.cpp/llava backend
LLaVA server (llama.cpp).
Aligning LMMs with Factually Augmented RLHF
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
支持中英文双语视觉-文本对话的开源可商用多模态模型。
#向量搜索引擎#Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...