multi-modal · GitHub Topics

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19.2 k

1 个月前

#数据仓库#Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....

翻译 - 访问和管理PyTorch和TensorFlow数据集的最快方法。轻松构建可伸缩的数据管道。Leading Data 2.0 http://activeloop.ai

数据集深度学习机器学习数据科学 PyTorch Tensorflow Python 人工智能 mlops 机器视觉 cv 图像处理 datalake langchain 大语言模型 large-language-models vector-database vector-search multi-modal

Python 8.52 k

13 天前

modelscope / modelscope

#自然语言处理#ModelScope: bring the notion of Model-as-a-Service to life.

自然语言处理 cv speech multi-modal science 深度学习机器学习 Python

Python 7.69 k

2 天前

OpenGVLab / InternVL

#大语言模型#[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

image-classification image-text-retrieval 大语言模型 semantic-segmentation video-classification vision-language-model vit-22b vit-6b multi-modal gpt gpt-4v gpt-4o

Python 7.48 k

1 天前

modelscope / agentscope

#大语言模型#Start building LLM-empowered multi-agent applications in an easier way.

agent 聊天机器人 gpt-4 large-language-models 大语言模型 llm-agent multi-agent distributed-agents multi-modal llama3 gpt-4o drag-and-drop mcp

Python 6.98 k

3 天前

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

cross-modality language-model multi-modal pretrained-models visual-language-models

Python 6.47 k

10 个月前

lucidrains / DALLE-pytorch

#计算机科学#Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

翻译 - 在Pytorch中实现/复制OpenAI，OpenAI的文本到图像转换器

人工智能深度学习 attention-mechanism text-to-image transformers multi-modal

Python 5.61 k

1 年前

OFA-Sys / Chinese-CLIP

#自然语言处理#本项目为CLIP模型的中文版本，使用大规模中文数据进行训练（~2亿图文对），旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务

中文机器视觉 multi-modal-learning 自然语言处理 PyTorch vision-and-language-pre-training image-text-retrieval clip pretrained-models vision-language 深度学习 multi-modal contrastive-loss transformers coreml-models

Python 5.07 k

8 个月前

marqo-ai / marqo

#搜索#Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

深度学习 information-retrieval 机器学习 vector-search tensor-search clip multi-modal 搜索引擎 transformers vision-language semantic-search visual-search 自然语言处理 hnsw knn Hacktoberfest ChatGPT gpt large-language-models

Python 4.82 k

7 小时前

valhalla / valhalla

Open Source Routing Engine for OpenStreetMap

翻译 - OpenStreetMap的开源路由引擎

OpenStreetMap dijkstra astar tiled directions isochrones multi-modal traveling-salesman routing-engine Routing (disambiguation)

C++ 4.77 k

1 天前

modelscope / data-juicer

#自然语言处理#Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

Python 4.17 k

1 天前

THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型

chatglm-6b gpt multi-modal

Python 4.15 k

8 个月前

VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

diffusion Image image-generation multi-modal image-edit

Jupyter Notebook 3.92 k

2 个月前

zjunlp / DeepKE

#自然语言处理#[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

knowledge-graph relation-extraction 中文 named-entity-recognition attribute-extraction low-resource document-level information-extraction PyTorch deepke ner 自然语言处理 few-shot prompt 深度学习 multi-modal

Python 3.86 k

1 个月前

PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

instruction-tuning large-vision-language-model multi-modal

Python 3.22 k

4 个月前

SciSharp / LLamaSharp

#大语言模型#A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

聊天机器人 gpt llama llamacpp 大语言模型 semantic-kernel llava multi-modal llama2 llama3 llama-cpp

C# 3.11 k

2 天前

docarray / docarray

#计算机科学#Represent, send, store and search multimodal data

翻译 - 非结构化数据的数据结构

docarray 数据结构 multimodal cross-modal neural-search 深度学习 nested-data qdrant weaviate nearest-neighbor-search protobuf elasticsearch multi-modal semantic-search 机器学习 PyTorch FastAPI pydantic

Python 3.04 k

22 天前

THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

cogvlm pretrained-models language-model multi-modal

Python 2.33 k

1 个月前

open-compass / VLMEvalKit

#大语言模型#Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

gpt-4v large-language-models llava multi-modal openai vqa 大语言模型 openai-api qwen gpt 机器视觉 PyTorch gpt4 ChatGPT clip vit evaluation claude gemini

Python 2.19 k

11 小时前

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

large-vision-language-model mixture-of-experts moe multi-modal

Python 2.14 k

4 个月前