visual-language-models · GitHub Topics

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

cross-modality language-model multi-modal pretrained-models visual-language-models

Python 6.47 k

10 个月前

camel-ai / crab

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

language-model-agent large-language-models multi-agent-systems visual-language-models

Python 333

5 个月前

bilel-bj / ROSGPT_Vision

#大语言模型#Commanding robots using only Language Models' prompts

prompt-engineering Robotics ros2 ChatGPT language-models language-models-are-next large-language-models 大语言模型 visual-language-models

Python 99

2 个月前

hk-zh / language-conditioned-robot-manipulation-models

https://arxiv.org/abs/2312.10807

foundation-models imitation-learning reinforcement-learning visual-language-models robot-manipulation

4 个月前

xinyanghuang7 / Basic-Visual-Language-Model

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

large-language-models visual-language-learning visual-language-models

Python 34

10 个月前

AlignGPT-VL / AlignGPT

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

large-language-models multimodal-large-language-models visual-language-models

Python 32

9 个月前

tianyu-z / VCR

#计算机科学#Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

benchmark 深度学习 visual-language-models

Python 31

1 个月前

jaisidhsingh / CoN-CLIP

#计算机科学#Implementation of the "Learn No to Say Yes Better" paper.

compositionality 深度学习 image-text-matching multimodal PyTorch visual-language-models

Python 31

1 个月前

Sid2697 / HOI-Ref

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

dataset large-language-models visual-language-models vlm

Python 26

1 年前

kesimeg / awesome-turkish-language-models

#大语言模型#A curated list of Turkish AI models, datasets, papers

large-language-models 大语言模型 speech turkish visual-language-models vlm

2 天前

amathislab / wildclip

Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models

behavior clip 机器视觉 visual-language-models

Python 22

1 年前

BioMedIA-MBZUAI / FetalCLIP

Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

人工智能 foundation-models Medical imaging visual-language-models

Python 17

14 天前

sduzpf / UAP_VLP

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

adversarial-attacks 深度神经网络 visual-language-models

Python 13

13 天前

csebuetnlp / IllusionVQA

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

visual-language-models vqa

Jupyter Notebook 13

6 个月前

declare-lab / Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

multimodality video-understanding video-question-answering visual-language-models

Python 11

9 个月前

GraphPKU / CoI

#大语言模型#Chain of Images for Intuitively Reasoning

聊天机器人 ChatGPT gpt4v llama llava multimodal visual-language-models

Python 9

1 年前

CristianoPatricio / concept-based-interpretability-VLM

#计算机科学#Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024 (Oral).

clip 深度学习 explainable-ai interpretability Medical imaging visual-language-models

Jupyter Notebook 9

10 个月前

ArthurBabkin / Parimate

#自然语言处理#A Telegram bot for validating audio and video content using CV models, SR models, and VLMs, with deepfake detection leveraging metadata analysis.

机器视觉 deepfake-detection face-recognition liveness-detection mvp PostgreSQL speech-recognition Telegram visual-language-models audio-processing 自然语言处理

Python 6

9 天前

AikyamLab / hallucinogen

A benchmark for evaluating hallucinations in large visual language models

人工智能 visual-language-models

Python 6

1 个月前

vlvink / PaliGemma-from-scratch

#计算机科学#PaliGemma is a project created from scratch, based on a YouTube guide, to learn and demonstrate application/library/system creation. The project uses modern development approaches and best practices f...

机器视觉 generative-ai language-model 机器学习 visual-language-models vlm

Python 6

3 个月前