[ICLR 2023] Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
[NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Graph Generation.
Graduate research exploring transformer behavior
Official code repository for the paper " Large-scale cross-modality pretrained model enhances cardiovascular state estimation and cardiomyopathy detection from electrocardiograms: An AI system develop...
Counting dataset for Vision & Language models. Introduced in the paper "Seeing Past Words: Testing the Cross-Modal Capabilities of Pretrained V&L Models". https://arxiv.org/abs/2012.12352
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Cross-Modal Center Loss for 3D Cross-Modal Retrieval (CVPR2021)
PyTorch original implementation of Cross-lingual Language Model Pretraining.
翻译 - PyTorch最初执行跨语言模型预训练。
basic modal for cross-modal-retrieval
Three dimensional cross-modal image inference
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining CVPR 2022
VideoX: a collection of video cross-modal models
Cross-modal few-shot adaptation with CLIP
Deep learning cross modal hashing in PyTorch
Audio-Visual Speech Separation with Cross-Modal Consistency
PyTorch implementation for paper "Deep Cross-Modal Hashing"
source code for paper "Deep Cross-Modal Hashing"
TensorFlow Implementation of Deep Cross-Modal Projection Learning
Scene Text Aware Cross Modal Retrieval (StacMR)
Python implementation of cross-modal hashing algorithms
Multi-Modal learning toolkit based on PaddlePaddle and PyTorch, supporting multiple applications such as multi-modal classification, cross-modal retrieval and image caption.
The baselines of cross-modal hashing retrieval.
Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)
Adaptive Cross-Modal Embeddings for Image-Sentence Alignment