#计算机科学#An open source implementation of CLIP.
翻译 - CLIP 的开源实现。
#自然语言处理#本项目为CLIP模型的中文版本,使用大规模中文数据进行训练(~2亿图文对),旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务
#自然语言处理#Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
#计算机科学#A concise but complete implementation of CLIP with various experimental improvements from recent papers
#Awesome#A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
#计算机科学#Build high-performance AI models with modular building blocks
#人脸识别#CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement
#计算机科学#[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation
#数据仓库#Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.
The official repository of Achelous and Achelous++
[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"
A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).
【CVPR2024】Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification