🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
#计算机科学#OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark
翻译 - OpenMMLab的下一代操作理解工具箱和基准
PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
翻译 - PySlowFast:FAIR的视频理解代码库,用于复制最新的视频模型。
#数据仓库#A collection of recent video understanding datasets, under construction!
Code and models of paper " ECO: Efficient Convolutional Network for Online Video Understanding", ECCV 2018
A deep learning library for video understanding research.
翻译 - 用于视频理解研究的深度学习库。
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
翻译 - [ICCV 2019] TSM:高效视频理解的时移模块。
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Long-Term Feature Banks for Detailed Video Understanding
PaddlePaddle models for Youtube-8M Video Understanding Challenge
papers collection and understanding for video person re-identification
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
#计算机科学#TensorFlow code for finetuning I3D model on UCF101.
video-understanding:Video Classification, Action Recognition, Video Datasets
1st place solution to Kaggle's 2018 YouTube-8M Video Understanding Challenge
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
[ICLR 2022] TAda! Temporally-Adaptive Convolutions for Video Understanding. This codebase provides solutions for video classification, video representation learning and temporal detection.
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.