Emu Series: Generative Multimodal Models from BAAI
#Awesome#The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insigh...
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)