#计算机科学#🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
翻译 - 使用BERT模型将可变长度句子映射到固定长度向量
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
翻译 - X-modaler 是用于跨模态分析的多功能高性能代码库。
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction