#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手
✨✨Latest Advances on Multimodal Large Language Models
#计算机科学#🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
翻译 - 使用BERT模型将可变长度句子映射到固定长度向量
#大语言模型#The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework. Website: https://swarms.ai
#计算机科学#Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
翻译 - 使用Openai的剪辑和警报器(隐式神经表示网络)的简单命令行工具
#大语言模型#🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Algorithms and Publications on 3D Object Tracking
Parsing-free RAG supported by VLMs
#大语言模型#Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP...
#计算机科学#The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
[CVPR 2023] Collaborative Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
#大语言模型#An open-source implementation for training LLaVA-NeXT.
#大语言模型# Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
An official PyTorch implementation of the CRIS paper
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Official repository for VisionZip (CVPR 2025)