A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it
WebUI extension for using Blip2
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
#大语言模型#Chat with NeRF enables users to interact with a NeRF model by typing in natural language.
Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models
#大语言模型#(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions