[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
#搜索# Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
#自然语言处理# 本项目为CLIP模型的中文版本,使用大规模中文数据进行训练(~2亿图文对),旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
#大语言模型# [ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
#大语言模型# 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
#自然语言处理# CLIPort: What and Where Pathways for Robotic Manipulation
翻译 - CLIPort:机器人操作的路径和路径
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction