image-captioning · GitHub Topics

#计算机科学#LAVIS - A One-stop Library for Language-Vision Intelligence

深度学习 deep-learning-library image-captioning salesforce vision-and-language vision-framework vision-language-pretraining vision-language-transformer visual-question-anwsering multimodal-datasets multimodal-deep-learning

Jupyter Notebook 10.44 k

5 个月前

salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

vision-language vision-and-language-pre-training image-text-retrieval image-captioning visual-question-answering vision-language-transformer

Jupyter Notebook 5.18 k

8 个月前

OpenGVLab / InternGPT

#大语言模型#InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, ...

ChatGPT foundation-model gpt gpt-4 gradio husky image-captioning langchain 大语言模型 multimodal vqa llama vicuna video-generation sam segment-anything click draggan

Python 3.22 k

8 个月前

sgrvinod / a-PyTorch-Tutorial-to-Image-Captioning

Show, Attend, and Tell | a PyTorch Tutorial to Image Captioning

翻译 - 显示，参加和讲述|PyTorch图像字幕教程

PyTorch pytorch-tutorial show-attend-and-tell image-captioning encoder-decoder attention-mechanism 机器视觉 mscoco

Python 2.83 k

3 年前

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

multimodal pretraining image-captioning text-to-image-synthesis visual-question-answering referring-expression-comprehension vision-language pretrained-models prompt prompt-tuning 中文

Python 2.49 k

1 年前

ttengwang / Caption-Anything

#大语言模型#Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/space...

ChatGPT controllable-generation segment-anything controllable-image-captioning image-captioning

Python 1.73 k

2 年前

peteanderson80 / bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

vqa visual-question-answering faster-rcnn caffe image-captioning mscoco

Jupyter Notebook 1.45 k

2 年前

imaginary-cloud / CameraManager

#IOS#Simple Swift class to provide all the configurations you need to create custom camera view in your app

Swift iOS camera video-recording image-captioning cocoapods swift-package-manager carthage qrcode-reader

Swift 1.38 k

9 个月前

NVlabs / prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

image-captioning language-model multi-modal-learning multi-task-learning vision-language-model vision-and-language vqa

Python 1.31 k

1 年前

microsoft / Oscar

Oscar and VinVL

vision-and-language pre-training image-captioning vqa oscar

Python 1.05 k

2 年前

ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.

image-captioning

Python 1 k

2 年前

YehLi / xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense r...

翻译 - X-modaler 是用于跨模态分析的多功能高性能代码库。

image-captioning video-captioning vision-and-language pretraining cross-modal-retrieval visual-question-answering tden

Python 970

2 年前

jhc13 / taggui

Tag manager and captioner for image datasets

image-captioning pyside6 stable-diffusion llava cogvlm florence-2

Python 959

2 个月前

yunjey / show-attend-and-tell

TensorFlow Implementation of "Show, Attend and Tell"

Tensorflow image-captioning show-attend-and-tell attention-mechanism

Jupyter Notebook 907

7 年前

SkalskiP / awesome-foundation-and-multimodal-models

#自然语言处理#👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

blip clip foundational-models grounding-dino llava multimodal segment-anything 机器视觉自然语言处理 open-vocabulary-detection open-vocabulary-segmentation image-captioning

Python 611

1 年前

kdexd / virtex

[CVPR 2021] VirTex: Learning Visual Representations from Textual Annotations

image-captioning coco-dataset pretrained-models model-zoo cvpr2021

Python 561

1 年前

kuanghuei / SCAN

#计算机科学#PyTorch source code for "Stacked Cross Attention for Image-Text Matching" (ECCV 2018)

cross-modal image-captioning 神经网络深度学习 PyTorch 机器视觉

Python 559

2 年前

aimagelab / meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

image-captioning transformer PyTorch cvpr2020

Python 532

2 年前

subho406 / OmniNet

#自然语言处理#Official Pytorch implementation of "OmniNet: A unified architecture for multi-modal multi-task learning" | Authors: Subhojeet Pramanik, Priyanka Agrawal, Aman Hussain

翻译 - Pytorch的官方实施“ OmniNet：用于多模式多任务学习的统一体系结构”作者：Subhojeet Pramanik，Priyanka Agrawal，Aman Hussain

机器学习深度学习神经网络人工智能 transformer 自然语言处理 image-captioning video-recognition multitask-learning multimodal-learning

Python 512

4 年前

gokayfem / ComfyUI_VLM_nodes

#大语言模型#Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

nodes comfyui custom-nodes llava 大语言模型 image-captioning mllm vlm

Python 484

2 个月前