blip2 · GitHub Topics

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

large-language-models video-language-pretraining vision-language-pretraining blip2 llama minigpt4 cross-modal-pretraining multi-modal-chatgpt

Python 2.99 k

10 个月前

sled-group / chat-with-nerf

#大语言模型#[ICRA 2024] Chat with NeRF enables users to interact with a NeRF model by typing in natural language.

blip2 ChatGPT gpt-4 nerf

Python 309

1 年前

mlpc-ucsd / BLIVA

#大语言模型#(AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions

blip2 聊天机器人 instruction-tuning llama 大语言模型 multimodal visual-language-learning lora

Python 257

1 年前

gongzix / NeuroClips

Official code base for NeuroClips

fmri blip2

MATLAB 86

2 个月前

SmithaUpadhyaya / fashion_image_caption

Automate Fashion Image Captioning using BLIP-2. Automatic generating descriptions of clothes on shopping websites, which can help customers without fashion knowledge to better understand the features ...

blip2 huggingface-transformers Image transformer multimodal-deep-learning

Jupyter Notebook 57

2 年前

kyegomez / qformer

#计算机科学#Implementation of Qformer from BLIP2 in Zeta Lego blocks.

人工智能 attention-mechanism blip2 machine 机器学习 multi-modal multi-modality

Python 39

5 个月前

eric-ai-lab / ComCLIP

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

blip2 causality clip compositionality image-text-matching image-text-retrieval vision-and-language

Python 35

8 个月前

BUAADreamer / SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

blip blip2 clip data-generation image-retrieval llama llava multimodal-learning transformer cross-modal-retrieval

Python 30

6 个月前

zer0int / CLIP-Interrogator-LongCLIP-hallucinwords

CLIP Interrogator, fully in HuggingFace Transformers 🤗, with LongCLIP & CLIP's own words and / or *your* own words!

blip blip2 clip

Python 16

3 个月前

nngocson2002 / ViVQA

The Multimodal Model for Vietnamese Visual Question Answering (ViVQA)

beit-3 blip2 efficientnet multimodal-deep-learning vqa

Python 16

8 个月前

ZhaoPeiduo / BLIP2-Japanese

Modifying LAVIS' BLIP2 Q-former with models pretrained on Japanese datasets.

captioning japanese PyTorch blip2 multimodal-deep-learning

Python 12

3 个月前

matlok-ai / bampe-weights

#大语言模型#This repository is for profiling, extracting, visualizing and reusing generative AI weights to hopefully build more accurate AI models and audit/scan weights at rest to identify knowledge domains for ...

人工智能 blip2 foundational-models generative-ai gptq image-to-image 大语言模型 safetensors stable-diffusion tiff transformers blender blender-python 深度学习

Python 9

1 年前

jacobmarks / fiftyone-image-captioning-plugin

Caption images across your datasets with state of the art models from Hugging Face and Replicate!

blip2 机器视觉 huggingface huggingface-transformers image-captioning llava qwen

Python 9

1 年前

MichiganNLP / visual_diversity_budget

#数据仓库#Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

active-learning blip2 clip 数据集 multimodal-deep-learning

1 年前

aws-samples / visual-question-answering-finetuning

Finetuning Large Visual Models on Visual Question Answering

blip2 finetuning genai vqa

Jupyter Notebook 6

1 年前

leeyunjai / image2text

caption generator using lavis and argostranslate

caption captions image-analysis blip2

Python 4

2 年前

craigsdennis / scairy

Uses AI to scare people...more.

人工智能 blip2 elevenlabs llama2 replicate

Python 4

2 年前

Pavansomisetty21 / Visual-Question-Answering-using-Gemini-LLM

In this we explore into visual Question Answering Using Gemini LLM and image was in URL or any other extension

blip blip2 gemini Git question-answering vision-language-model vision-transformer visual-question-answering vlm vqa 人工智能 generative-ai generative-model

Jupyter Notebook 4

3 个月前

otdavies / AIOrganizeMyDesktop

Too lazy to organize my desktop, make gpt + BLIP-2 do it /s

自动化 Desktop example-project gpt-3 organization Python 人工智能 blip2 机器学习

Python 2

1 年前

HemantM29 / Multimodal-Document-Analysis-and-Query-Retrieval

This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-l...

blip2 pdf-processing qwen2-vl retrieval-augmented-generation semantic-search transformers visual-language-models

Jupyter Notebook 1

3 个月前