#大语言模型#Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vis...
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes ...
LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft
A ComfyUI extension that allows you to use some LLM templates provided by Ollama, such as Gemma, Llava (multimodal), Llama2, Llama3 or Mistral
#大语言模型#🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Implementation of training LLava with LLama3, supporting pre-training and finetuning
lmage Caption Generator using llava and llama3 through the ollama library
Python APIs built with ollama models (llama3.1 & llava)
LLaVA-Interactive-Demo
LLaVA server (llama.cpp).
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Aligning LMMs with Factually Augmented RLHF
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
A simple "Be My Eyes" web app with a llama.cpp/llava backend
streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, and Qwen2-VL
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Mixture-of-Experts for Large Vision-Language Models
支持中英文双语视觉-文本对话的开源可商用多模态模型。
#向量搜索引擎#Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...