#计算机科学#🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
翻译 - 使用BERT模型将可变长度句子映射到固定长度向量
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
翻译 - X-modaler 是用于跨模态分析的多功能高性能代码库。
#Awesome#The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...
#自然语言处理#Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss 🐾 https://arxiv.org/abs/1711.05535
[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)
Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
[CVPR 2020, Oral] "Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. .
Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
#计算机科学#Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Unsupervised Contrastive Cross-modal Hashing (IEEE TPAMI 2023, PyTorch Code)