MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
#自然语言处理#ModelScope: bring the notion of Model-as-a-Service to life.
#大语言模型#[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
a state-of-the-art-level open visual language model | 多模态预训练模型
#计算机科学#Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
翻译 - 在Pytorch中实现/复制OpenAI,OpenAI的文本到图像转换器
#大语言模型#Start building LLM-empowered multi-agent applications in an easier way.
#自然语言处理#本项目为CLIP模型的中文版本,使用大规模中文数据进行训练(~2亿图文对),旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务
#搜索#Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Open Source Routing Engine for OpenStreetMap
翻译 - OpenStreetMap的开源路由引擎
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
#大语言模型#A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
GPT4V-level open-source multi-modal model based on Llama3-8B
Mixture-of-Experts for Large Vision-Language Models
#大语言模型#Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
#大语言模型#[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
#大语言模型#Project Page for "LISA: Reasoning Segmentation via Large Language Model"
SALMONN: Speech Audio Language Music Open Neural Network
FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
Unified Controllable Visual Generation Model