PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
#自然语言处理#本项目为CLIP模型的中文版本,使用大规模中文数据进行训练(~2亿图文对),旨在帮助用户快速实现中文领域的图文特征&相似度计算、跨模态检索、零样本图片分类等任务
Recent Advances in Vision and Language Pre-training (VLP)
A curated list of vision-and-language pre-training (VLP). :-)
Code Implementation of "Simple Image-level Classification Improves Open-vocabulary Object Detection" (AAAI'24)
Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations
Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)
A list of research papers on knowledge-enhanced multimodal learning
The official implementation for the ICCV 2023 paper "Grounded Image Text Matching with Mismatched Relation Reasoning".
#自然语言处理#Korean version of CLIP which achieves Korean cross-modal retrieval and representation generation.