#计算机科学#A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
翻译 - 来自Facebook AI Research(FAIR)的视觉和语言多模式研究的模块化框架
streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
#自然语言处理#Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
#数据仓库#Audio Captioning datasets for PyTorch.
VisText is a benchmark dataset for semantically rich chart captioning.
#计算机科学#Fully-Convolutional Point Networks for Large-Scale Point Clouds
#计算机科学#A Tennis dataset and models for event detection & commentary generation
#自然语言处理#Python code for handling the Clotho dataset.
Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey
A Base Tensorflow Project for Medical Report Generation
#计算机科学#Medical image captioning using OpenAI's CLIP
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation. (CVPR 2023)
#自然语言处理#[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
A gradio based image captioning tool that uses the GPT-4-Vision API to generate detailed descriptions of images.
Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
#大语言模型#Using LLMs and pre-trained caption models for super-human performance on image captioning.