A simple script that reads a directory of videos, grabs a random frame, and automatically discovers a prompt for it
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
WebUI extension for using Blip2
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Chat with NeRF enables users to interact with a NeRF model by typing in natural language.
Experiments with LAVIS library to perform image2text and text2image retrieval with BLIP and BLIP2 models
(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions
BLIP2 captioning tool as an extension of AUTOMATIC's WebUI