#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手
#大语言模型#Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
#大语言模型#🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
#大语言模型#An open-source implementation for training LLaVA-NeXT.
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
#大语言模型#(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
🧘🏻♂️KarmaVLM (相生):A family of high efficiency and powerful visual language model.
#大语言模型#Multimodal Instruction Tuning for Llama 3
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
#自然语言处理#[ACM MMGR '24] 🔍 Shotluck Holmes: A family of small-scale LLVMs for shot-level video understanding
PyTorch implementation of OpenAI's CLIP model for image classification, visual search, and visual question answering (VQA).
#大语言模型#Docker image for LLaVA: Large Language and Vision Assistant
Efficient Video Question Answering