MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
#自然语言处理# ModelScope: bring the notion of Model-as-a-Service to life.
a state-of-the-art-level open visual language model | 多模态预训练模型
#大语言模型# [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
#计算机科学# Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
翻译 - 在Pytorch中实现/复制OpenAI,OpenAI的文本到图像转换器
#大语言模型# Start building LLM-empowered multi-agent applications in an easier way.
#搜索# Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Open Source Routing Engine for OpenStreetMap
翻译 - OpenStreetMap的开源路由引擎
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
#大语言模型# A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
GPT4V-level open-source multi-modal model based on Llama3-8B
Mixture-of-Experts for Large Vision-Language Models
#大语言模型# [NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
#大语言模型# Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
#大语言模型# Project Page for "LISA: Reasoning Segmentation via Large Language Model"
SALMONN: Speech Audio Language Music Open Neural Network
FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
Unified Controllable Visual Generation Model