multi-modal-learning · GitHub Topics

mlfoundations / open_clip

#计算机科学#An open source implementation of CLIP.

翻译 - CLIP 的开源实现。

深度学习 PyTorch 机器视觉 language-model multi-modal-learning contrastive-loss zero-shot-classification pretrained-models

Python 11.52 k

7 天前

OFA-Sys / Chinese-CLIP

#自然语言处理#本项目为CLIP模型的中文版本，使用大规模中文数据进行训练（~2亿图文对），旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务

中文机器视觉 multi-modal-learning 自然语言处理 PyTorch vision-and-language-pre-training image-text-retrieval clip pretrained-models vision-language 深度学习 multi-modal contrastive-loss transformers coreml-models

Python 5.07 k

8 个月前

lyuchenyang / Macaw-LLM

#自然语言处理#Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration

language-model multi-modal-learning 自然语言处理深度学习机器学习 neural-networks

Python 1.56 k

3 个月前

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa

Python 1.31 k

1 年前

lucidrains / x-clip

#计算机科学#A concise but complete implementation of CLIP with various experimental improvements from recent papers

人工智能深度学习 contrastive-learning zero-shot-learning multi-modal-learning

Python 707

1 年前

jokieleung / awesome-visual-question-answering

#Awesome#A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Awesome Lists vqa multi-modal multi-modal-learning

662

2 年前

OpenRobotLab / EmbodiedScan

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

3d-vision 机器视觉 multi-modal-learning Robotics

Python 581

2 个月前

kyegomez / zeta

#计算机科学#Build high-performance AI models with modular building blocks

人工智能 multi-modal transformers 深度学习 gpt4 llama2 multi-agent-systems multi-modal-learning multi-platform PyTorch speech-recognition transformer

Python 492

5 天前

DmitryRyumin / CVPR-2023-24-Papers

#人脸识别#CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included...

action-recognition autonomous-driving biometrics 机器视觉 cvpr cvpr2023 数据集深度学习 face-recognition gesture-recognition image-synthesis medical-image-processing multi-modal-learning pattern-recognition segmentation self-supervised-learning video-synthesis cvpr2024

Python 448

9 个月前

zjukg / KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

cross-modal-retrieval Entity resolution image-classification image-generation information-extraction knowledge-graph knowledge-graph-embeddings large-language-models multi-modal-learning paper-list survey surveys visual-question-answering awsome

401

4 个月前

zhengli97 / PromptKD

[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"

cvpr2024 multi-modal-learning prompt-learning vision-language-model knowledge-distillation clip

Python 288

1 个月前

Ysz2022 / NeRCo

[ICCV 2023] Implicit Neural Representation for Cooperative Low-light Image Enhancement

neural-representation multi-modal-learning iccv iccv2023

Python 243

1 年前

moabarar / nemar

#计算机科学#[CVPR2020] Unsupervised Multi-Modal Image Registration via Geometry Preserving Image-to-Image Translation

multimodal image-to-image-translation multi-modal multi-modal-learning affine-transformation 深度学习 cnn PyTorch image-registration cvpr2020

Python 183

5 年前

huggingface / chug

#数据仓库#Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.

机器视觉数据集 distributed-training document-understanding multi-modal-learning pdf-document

Python 157

1 年前

GuanRunwei / Achelous

The official repository of Achelous and Achelous++

multi-modal-learning multi-task-learning object-detection object-tracking point-cloud-segmentation semantic-segmentation

Python 150

9 个月前

qizekun / ReCon

[ICML 2023] Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

Point cloud multi-modal-learning representation-learning self-supervised-learning

Python 143

9 个月前

wjun0830 / CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

机器视觉 detr multi-modal-learning PyTorch video-understanding

Python 127

8 个月前

shikras / d-cube

A detection/segmentation dataset with labels characterized by intricate and flexible expressions. "Described Object Detection: Liberating Object Detection with Flexible Expressions" (NeurIPS 2023).

multi-modal-learning object-detection referring-expression-comprehension vision-language dataset open-vocabulary-detection

Python 117

1 年前