#大语言模型#LLaVA是一个具有 GPT-4V 级别功能的大语言和视觉模型助手
#大语言模型#[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
#大语言模型#InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Align Anything: Training All-modality Model with Feedback
#计算机科学#Collection of AWESOME vision-language models for vision tasks
#大语言模型#The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, ...
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
#大语言模型#🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
#大语言模型#日本語LLMまとめ - Overview of Japanese LLMs
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
#自然语言处理#This series will take you on a journey from the fundamentals of NLP and Computer Vision to the cutting edge of Vision-Language Models.
#大语言模型#MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
#计算机科学#[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want