#Awesome#Famous Vision Language Models and Their Architectures
#自然语言处理#👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
#大语言模型#Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
This repository provides an interactive image colorization tool that leverages Stable Diffusion (SDXL) and BLIP for user-controlled color generation. With a retrained model using the ControlNet approa...
#计算机科学#A data discovery and manipulation toolset for unstructured data
Image captioning using python and BLIP
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
FiveM Script to allow civilians to dial 911, giving out their location, name, and reason they called, adding a blip to the map too
#计算机科学#Collection of OSS models that are containerized into a serving container
CLIP Interrogator, fully in HuggingFace Transformers 🤗, with LongCLIP & CLIP's own words and / or *your* own words!
SAM + CLIP + DIFFUSION for image to edit objects in images using plain text
oCaption: Leveraging OpenAI's GPT-4 Vision for Advanced Image Captioning
#计算机科学#Securade.ai Sentinel - A monitoring and surveillance application that enables visual Q&A and video captioning for existing CCTV cameras.