#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
#大语言模型#InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...
Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning
翻译 - 显示,参加和讲述|PyTorch图像字幕教程
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
#大语言模型#Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
#IOS#Simple Swift class to provide all the configurations you need to create custom camera view in your app
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...
翻译 - X-modaler 是用于跨模态分析的多功能高性能代码库。
Tag manager and captioner for image datasets
TensorFlow Implementation of "Show, Attend and Tell"
#自然语言处理#👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations
#计算机科学#PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)
Meshed-Memory Transformer for Image Captioning. CVPR 2020
#自然语言处理#Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain
翻译 - Pytorch的官方实施“ OmniNet:用于多模式多任务学习的统一体系结构”作者:Subhojeet Pramanik,Priyanka Agrawal,Aman Hussain
#大语言模型#Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation