captioning · GitHub Topics

facebookresearch / mmf

#计算机科学#A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

翻译 - 来自Facebook AI Research（FAIR）的视觉和语言多模式研究的模块化框架

PyTorch vqa pretrained-models multimodal 深度学习 captioning dialog textvqa hateful-memes multi-tasking

Python 5.56 k

6 天前

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

captioning fine-tuning florence-2 multimodal objectdetection paligemma phi-3-vision transformers vision-and-language vqa qwen2-vl

Python 2.54 k

6 天前

fpgaminer / joycaption

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

captioning vlm

Python 410

4 个月前

ltguo19 / VSUA-Captioning

#自然语言处理#Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

captioning language-generation 深度学习 PyTorch 自然语言处理

Python 257

5 年前

DavidHuji / CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

captioning clip gpt-2 multimodal-deep-learning zero-shot-learning

Python 192

1 年前

Labbeti / aac-datasets

#数据仓库#Audio Captioning datasets for PyTorch.

PyTorch audio caption 数据集 captioning dataset 深度学习

Python 115

1 个月前

HaydenFaulkner / Tennis

#计算机科学#A Tennis dataset and models for event detection & commentary generation

机器学习机器视觉 dataset fine-grained captioning Video mxnet gluon

Python 91

5 年前

mitvis / vistext

VisText is a benchmark dataset for semantically rich chart captioning.

captioning charts dataset t5

Jupyter Notebook 90

2 年前

drethage / fully-convolutional-point-network

#计算机科学#Fully-Convolutional Point Networks for Large-Scale Point Clouds

机器视觉 3D semantic-segmentation 深度学习深度神经网络 captioning Point cloud meshes

Python 86

6 年前

Chen-Yang-Liu / Awesome-RS-Temporal-VLM

Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey

change-detection multimodal-deep-learning captioning

6 天前

audio-captioning / clotho-dataset

#自然语言处理#Python code for handling the Clotho dataset.

audio 深度学习自然语言处理 captioning

Python 81

4 年前

Mauville / MedCLIP

#计算机科学#Medical image captioning using OpenAI's CLIP

深度学习 clip captioning 机器学习 Medical imaging

Jupyter Notebook 73

2 年前

wangleihitcs / MedicalReportGeneration

A Base Tensorflow Project for Medical Report Generation

Tensorflow captioning

Python 72

6 年前

ParitoshParmar / MTL-AQA

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

multitask-learning video-understanding video-processing video-captioning PyTorch action-recognition representation-learning lstm captioning

Python 68

5 个月前

aimagelab / pacscore

[CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

captioning captioning-videos 机器视觉 cvpr cvpr2023 vision-and-language

Python 61

1 个月前

TheShadow29 / VidSitu

#自然语言处理#[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

vision vision-and-language grounding 自然语言处理 Video srl captioning-videos captioning

Python 59

4 年前

42lux / CaptainCaption

A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.

captioning gpt-4-vision gradio openai-api tagging

Python 58

5 个月前

Labbeti / aac-metrics

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

audio captioning 监控 text

Python 45

3 个月前

lucidrains / AoA-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

attention attention-mechanism vqa visual-question-answering captioning

Python 41

4 年前

DavidMChan / caption-by-committee

#大语言模型#Using LLMs and pre-trained caption models for super-human performance on image captioning.

人工智能 captioning ChatGPT 深度学习 Image 机器学习 Python

Python 40

1 年前