#大语言模型#[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
#自然语言处理#awesome grounding: A curated list of research papers in visual grounding
#自然语言处理#[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
[CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
#自然语言处理#[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
PyTorch Implementation of Consensus-based Sequence Training for Video Captioning
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision
A tool for downloading from public image boards (which allow scraping) / preview your images & tags / edit your images & tags. Additional tabs for downloading other desired code repositories as well a...
Transcription and annotation interface for recorded audio or video files
#自然语言处理#An image and video description generator using an CNN-RNN based architecture.
M-VAD Names Dataset. Multimedia Tools and Applications (2019)
Caption generator for live camera feed
Sample app to add captions to an uploaded video. From api.video (https://api.video)
Online professional courses that are captioned and/or subtitled
Video Search using Natural Language
Official Pytorch Implementation of 'LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport' (ICASSP2025)
#大语言模型#Generate TikTok— and Instagram—tailored captions and hashtags for your videos using the power of some super creative robots up in the clouds ☁️ 🤖 💬 ☁️
A multilingual automatic speech recognition and video captioning tool using faster whisper. Supports real-time translation to english. Runs on consumer grade cpu.