#面试#A one stop repository for generative AI research updates, interview resources, notebooks and much more!
#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
#大语言模型#Build multimodal language agents for fast prototype and production
Code for ALBEF: a new vision-language pre-training method
Multimodal-GPT
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Real-time and accurate open-vocabulary end-to-end object detection
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
#大语言模型#日本語LLMまとめ - Overview of Japanese LLMs
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Codebase for Aria - an Open Multimodal Native MoE
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
翻译 - X-modaler 是用于跨模态分析的多功能高性能代码库。
#自然语言处理#My Reading Lists of Deep Learning and Natural Language Processing
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
#计算机科学#[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".