PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Recent Advances in Vision and Language Pre-training (VLP)
Vision-Language Pre-training for Image Captioning and Question Answering
Conceptual 12M is a dataset containing (image-URL, caption) pairs collected for vision-and-language pre-training.
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
翻译 - X-modaler 是用于跨模态分析的多功能高性能代码库。
A curated list of vision-and-language pre-training (VLP). :-)
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
[MICCAI-2022] This is the official implementation of Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training.
Pathology Language and Image Pre-Training (PLIP) is the first vision and language foundation model for Pathology AI (Nature Medicine). PLIP is a large-scale pre-trained model that can be used to extra...
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)
Code for ALBEF: a new vision-language pre-training method
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022
Grounded Language-Image Pre-training
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN
[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Code and model for the paper "Improving Language Understanding by Generative Pre-Training"
MASS: Masked Sequence to Sequence Pre-training for Language Generation
翻译 - MASS:用于语言生成的蒙版序列到序列预训练
On Efficient Transformer-Based Image Pre-training for Low-Level Vision
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
#计算机科学#OpenMMLab Pre-training Toolbox and Benchmark
翻译 - OpenMMLab图像分类工具箱和基准
CLIP (Contrastive Language-Image Pre-Training) in tensorflow
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Code release for SLIP Self-supervision meets Language-Image Pre-training
翻译 - SLIP 自监督的代码发布满足语言-图像预训练
Implementations of some self-supervised methods for pre-training vision models
MPNet: Masked and Permuted Pre-training for Language Understanding https://arxiv.org/pdf/2004.09297.pdf